Double Trouble – Point Estimates

There was an article run in the Athletic this week discussing the drop in doubles and triples, ending with a proposal to limit where outfielders could play. It had its points, but it left me with more questions than answers. And the changing baselines certainly didn’t help.

*This is why a good editor can help - a manager (if you’re an IC) or an editor (if you’re a writer) can spot mistakes you’ve become blind to.

This uses 2007, 2015, and 2019 as baselines. Now, I’ve put together enough slides that have seen multiple revisions to know that these things happen, so I’m not saying that this was done with malice or to cherry-pick points or anything. Indeed, this is a small quibble compared to the overall diagnostic. But it wasn’t a promising start.

No, indeed, my primary beef is that it ignores the increase of three true outcomes (strikeouts, walks, and home runs). If there are fewer balls in play, wouldn’t it stand to reason that there’d be fewer doubles and triples? And, if, as the author argues, defense is turning doubles into singles, we’d expect an increase in singles as well.

We’ll use data from Fangraphs to validate this. We’ll start with counts of events, going back to 2000 so we have a decent baseline. We’ll exclude 2020 since the pandemic-shortened season would skew our data. We’ll also include xMR* control limits to see what indicates change vs. normal variation.

*Normally I like to have the limits change when the process changes, but the package I normally use for that isn’t working and I haven’t taken the time to code it myself. Yet.

Show the code

library(hrbrthemes)
library(janitor)
library(ggQC)
library(gt)
library(gtExtras)
library(tidyverse)

fg <- read_csv('fangraphs-leaderboards.csv') %>%
  clean_names()

fg %>%
  filter(season >= 2000 & season != 2020) %>%
  select(season, so, bb, x1b, x2b, x3b, hr_26) %>%
  pivot_longer(-season, names_to = 'stat', values_to = 'value') %>%
  mutate(stat = case_when(stat == 'so' ~ 'Strikeouts',
                          stat == 'bb' ~ 'Walks',
                          stat == 'x1b' ~ 'Singles',
                          stat == 'x2b' ~ 'Doubles',
                          stat == 'x3b' ~ 'Triples',
                          stat == 'hr_26' ~ 'Home Runs')) %>%
  mutate(stat = factor(stat, levels = c('Singles', 'Doubles', 'Triples', 'Home Runs', 'Strikeouts', 'Walks'))) %>%
  ggplot(aes(x = season, y = value))+
  geom_line()+
  stat_QC(method = "XmR",
          color.qc_limits = "#36454F",
          color.qc_center = "#36454F")+
  theme_ipsum()+
  facet_wrap(stat ~ ., scales = 'free')+
  theme(text=element_text(size = 16,  family="Oswald"),
        panel.grid.minor = element_blank(),
        plot.title.position = "plot")+
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale()))+
  ylab('')+
  xlab('')+
  labs(title = 'Outcome Counts By Year',
       subtitle = 'Excludes 2020',
       caption = 'Data courtesy Fangraphs')

Well, regardless of baseline, the premise that doubles and triples have gone down was indeed correct. However; strikeouts have gone up substantially, while home runs have also seen an increase.

So we think that the share of events has shifted. But it’s hard to tell by raw numbers alone. Let’s now look at each of these as a % of plate appearances. Since we’re normalizing the data, we can bring 2020 back into the fold.

Show the code

fg %>%
  filter(season >= 2000) %>%
  select(season, pa, so, bb, x1b, x2b, x3b, hr_26) %>%
  pivot_longer(-c(season, pa), names_to = 'stat', values_to = 'value') %>%
  mutate(stat = case_when(stat == 'so' ~ 'Strikeouts',
                          stat == 'bb' ~ 'Walks',
                          stat == 'x1b' ~ 'Singles',
                          stat == 'x2b' ~ 'Doubles',
                          stat == 'x3b' ~ 'Triples',
                          stat == 'hr_26' ~ 'Home Runs')) %>%
  mutate(stat = factor(stat, levels = c('Singles', 'Doubles', 'Triples', 'Home Runs', 'Strikeouts', 'Walks'))) %>%
  mutate(value = value / pa) %>%
  ggplot(aes(x = season, y = value))+
  geom_line()+
  stat_QC(method = "XmR",
          color.qc_limits = "#36454F",
          color.qc_center = "#36454F")+
  theme_ipsum()+
  facet_wrap(stat ~ ., scales = 'free')+
  theme(text=element_text(size = 16,  family="Oswald"),
        panel.grid.minor = element_blank(),
        plot.title.position = "plot")+
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale()))+
  ylab('')+
  xlab('')+
  labs(title = 'Outcome % of PA',
       caption = 'Data courtesy Fangraphs')

This doesn’t change our story a lot. However; it does enforce that batters are striking out at higher rates, which leaves fewer balls in play. Let’s shift to BABIP - Batting Average on Balls in Play - to help us out here. This simply tells us how many non-home run hit balls turn into hits. We can take the denominator of hit balls to see how that’s changed over time. Since this is count data, we’ll exclude 2020.

Show the code

fg %>%
  filter(season >= 2000 & season != 2020) %>%
  mutate(bip = ab - so - hr_26 + sf) %>%
  ggplot(aes(x = season, y = bip))+
  geom_line()+
  stat_QC(method = "XmR",
          color.qc_limits = "#36454F",
          color.qc_center = "#36454F")+
  theme_ipsum()+
  theme(text=element_text(size = 16,  family="Oswald"),
        panel.grid.minor = element_blank(),
        plot.title.position = "plot")+
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale()))+
  ylab('')+
  xlab('')+
  labs(title = 'Total Balls in Play',
       subtitle = 'Excludes 2020',
       caption = 'Data courtesy Fangraphs')

Ah, here we go - the total number of balls in play has indeed been dropping, starting in about 2007. There’s a bit of a rebound in 2022, which is when the NL adopted the DH.

We can take one final step and see the rate of singles, doubles, and triples on balls in play.

Show the code

fg %>%
  filter(season >= 2000) %>%
  mutate(bip = ab - so - hr_26 + sf) %>%
  select(season, bip, x1b, x2b, x3b) %>%
  pivot_longer(-c(season, bip), names_to = 'stat', values_to = 'value') %>%
  mutate(stat = case_when(stat == 'x1b' ~ 'Singles',
                         stat == 'x2b' ~ 'Doubles',
                         stat == 'x3b' ~ 'Triples',)) %>%
  mutate(stat = factor(stat, levels = c('Singles', 'Doubles', 'Triples'))) %>%
  mutate(value_bip = value / bip) %>%
  ggplot(aes(x = season, y = value_bip))+
  geom_line()+
  stat_QC(method = "XmR",
          color.qc_limits = "#36454F",
          color.qc_center = "#36454F")+
  theme_ipsum()+
  facet_wrap(stat ~ ., scales = 'free')+
  theme(text=element_text(size = 16,  family="Oswald"),
        panel.grid.minor = element_blank(),
        plot.title.position = "plot")+
  scale_y_continuous(labels = scales::label_number(scale_cut = scales::cut_short_scale()))+
  ylab('')+
  xlab('')+
  labs(title = 'BABIP Outcomes',
       caption = 'Data courtesy Fangraphs')

Doubles are down a bit from recent years, but within our range - we wouldn’t say that hit balls are much less likely to become a double based on this. Singles and triples have been down for a while, though - this is what in SPC we’d call a long run, i.e. multiple years below average. Singles are up a bit in recent years, but not to the degree where it looks like it’s being driven by drops in doubles and triples.

Let’s end by comparing 2025 to 2015. This is ten years ago, which doesn’t necessarily make it the best of the baseline years, but it does seem to conform to changes in the charts above.

Show the code

fg %>%
  filter(season %in% c(2015, 2025)) %>%
  mutate(bip = ab - so - hr_26 + sf) %>%
  select(season, pa, bip, x1b, x2b, x3b) %>%
  mutate(bip_perc = bip / pa,
         babip_single = x1b / bip,
         babip_double = x2b / bip,
         babip_triple = x3b / bip) %>%
  pivot_longer(-season, names_to = 'stat', values_to = 'value') %>%
  pivot_wider(names_from = season, values_from = value) %>%
  select(stat, `2015`, `2025`) %>%
  mutate(stat = case_when(stat == 'so' ~ 'Strikeouts',
                          stat == 'bb' ~ 'Walks',
                          stat == 'x1b' ~ 'Singles',
                          stat == 'x2b' ~ 'Doubles',
                          stat == 'x3b' ~ 'Triples',
                          stat == 'hr_26' ~ 'Home Runs',
                          stat == 'pa' ~ 'PA',
                          stat == 'bip' ~ 'Balls In Play',
                          stat == 'bip_perc' ~ 'Ball in Play / PA',
                          stat == 'babip_single' ~ 'BABIP - Single',
                          stat == 'babip_double' ~ 'BABIP - Double',
                          stat == 'babip_triple' ~ 'BABIP - Triple',
                          TRUE ~ stat)) %>%
  mutate(delta = `2025` - `2015`) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = 2:4, rows = 1:5, decimals = 0) %>%
  fmt_percent(columns = 2:4, rows = 6:9, decimals = 1)

stat	2015	2025	delta
PA	183,627	182,925	−702
Balls In Play	124,365	118,676	−5,689
Singles	28,016	26,115	−1,901
Doubles	8,242	7,745	−497
Triples	939	628	−311
Ball in Play / PA	67.7%	64.9%	−2.9%
BABIP - Single	22.5%	22.0%	−0.5%
BABIP - Double	6.6%	6.5%	−0.1%
BABIP - Triple	0.8%	0.5%	−0.2%

That’s a lot less balls in play. If we take the 118K balls in play in 2025 and take the same rates from 2015 for outcomes, we’d get:

Singles: 118,676 * 22.5% = 26,702. This is 587 more singles than we had in 2025, accounting for 31% of the -1,901.
Doubles: 118,676 * 6.6% = 7,868. This is 123 more doubles than we had in 2025, accounting for 25% of the -497.
Triples: 118,676 * 0.8% = 896. This is 268 more triples than we had in 2025, accounting for 86% of the -311.

So, the math suggests that the difference in triples is largely due to the decrease in triples on balls in play, which we can attribute to improved defense. Doubles are partially driven by improved defense, but since only 123 of the 497 “missing” doubles are due to rate change, we can chalk this up more to a drop in balls in play*. (In fact - if we had 124K BIP in 2025 at the 6.5% double rate, we’d have 8,083 doubles, not far from the 2015 mark.) Singles have also dropped primarily due to the decrease in balls in play.

*With the caveat that doing the math like this ignores random variation and may not pick up on other factors.

The triple is one of the best plays in the game, so it’s somewhat sad to see it declining. But the game is always changing, and I don’t think gluing outfielders in place is the right approach.

My initial reaction reading the article was that it was mostly wrong, now I say it’s a bit more right than I thought though it’s still missing some context, especially around balls in play. Having my assumptions challenged is one of the best parts of working with data, and hopefully this exercise proves helpful to others.