Batting Halves and Regression

The Tigers started the first half* of 2025 strong with a .325 team wOBA, only to see that drop to .309 in the second half. We can look at which batters changed, but we need to lay some groundwork first before we dig in.

*I’m using the “pre and post All-Star game” definition, which results in unequal halves. The literally-minded among us would like to point out that they are no longer halves in this case, but must also accept that colloqiualsims are a necessary part of language and move on with life.

Firstly, we want to look at wOBA since it shows a batter’s overall contribution. And we want to look at their performance, which is not the same as their skill. Gleyber Torres had a second half wOBA of .296 in 269 plate appearances. He has a career .333 wOBA in 4,301 PA. So when we look at his second half, we don’t necessarily think he’s suddenly turned into Don Kelly. It’s possible, and we can look deeper into his underlying stats to see what’s up. But since we’re looking purely at descriptive statistics, we’re only interested in his performance, not necessarily his talent.

We’re also not testing to see if the difference between halves is statistically significant. Given the sample sizes it’s unlikely, but not impossible if the delta is large enough. But again, we’re interested in the story here. It may prompt further exploration, but that’s not the main intent.

Also, crucially, we’re not saying that the first half is a “baseline”. Some players had first halves that were better than we’d have expected and they returned to normal in the second half. Some went the other way. It’s easy to get anchored to “first number = normal”, but let’s instead think “first number happened, may or may not be normal*”.

*Say this at a sports bar and see what happens.

Next, it might be helpful first to get a feel for how first and second halves generally correlate so we have context for the broader league. Using data from Fangraphs, we’ll look at the first and second half correlations for a number of metrics. There are a lot of hitting metrics, so we’ll stick with the defaults on FG, adding in Barrel % and average Exit Velocity. We’ll look only at hitters who qualified in both halves, which leaves us with 97 hitters. Not super robust, but not bad, either. Let’s plot the halves against each other.

Show the code

library(hrbrthemes)
library(janitor)
library(gt)
library(gtExtras)
library(tidyverse)

mlb_first_half <- read_csv('mlb_2025_first_half.csv') %>%
  clean_names() %>%
  mutate(half = 'First')

mlb_second_half <- read_csv('mlb_2025_second_half.csv') %>%
  clean_names() %>%
  mutate(half = 'Second')

mlb_2025 <- mlb_first_half %>%
  bind_rows(mlb_second_half) %>%
  select(-mlbamid, -name_ascii, -def, -bs_r, -off, -g, -pa, -hr, -rbi, -sb, -r, -war)

mlb_munged <- mlb_2025 %>%
  pivot_longer(-c(player_id, name, half, team), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  drop_na() %>%
  mutate(dif = Second - First) %>%
  mutate(metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric))

mlb_munged %>%
  ggplot(aes(x = First, y = Second))+
  geom_point()+
  stat_smooth(method = "lm")+
  facet_wrap(metric ~ ., scales = 'free')+
  theme_ipsum()

Nothing too surprising here - BABIP is fickle, walk and strikeout rates are fairly stable, as is batted ball data.

Show the code

mlb_munged %>%
  group_by(metric) %>%
  summarise(correlation = cor(First, Second),
            rsq = correlation^2,
            ) %>%
  arrange(desc(rsq)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(correlation, rsq),
             decimals = 2)

metric	correlation	rsq
K %	0.86	0.73
Barrel %	0.79	0.62
EV	0.78	0.60
BB %	0.70	0.48
xwOBA	0.59	0.34
ISO	0.58	0.34
OBP	0.39	0.15
SLG	0.33	0.11
wRC+	0.31	0.09
wOBA	0.27	0.07
AVG	0.16	0.03
BABIP	0.14	0.02

So, for the players selected, 73% of the variance in second-half strikeout rate can be explained by first-half strikeout rate. So if a guy strikes out a lot in the first half, don’t bet on him changing in the second. BABIP, on the other hand, is basically random, moreso when we split into small samples like this.

So we have an idea of what to expect., and we know wOBA can be volitile across halves, at least compared to other metrics. But what we really want is to know the driving forces behind the Tigers drop in wOBA. Let’s look at that for each player, along with their PA, % of total team PA, and xwOBA (which is calculated based on exit velocity, batted ball type, runner speed, and a few other variables) and see if anything jumps out.

One more note - since we don’t want players with 5 PA on top of the chart, we’ll sort by the wOBA difference * second half plate appearances. This doesn’t translate directly into runs since wOBA includes a multiplier to scale similarly to OBP, but will give us a ranking of who had the highest impact between change and PA.

Show the code

tigers_first_half <- read_csv('tigers_2025_first_half.csv') %>%
  clean_names() %>%
  mutate(half = 'First')

tigers_second_half <- read_csv('tigers_2025_second_half.csv') %>%
  clean_names() %>%
  mutate(half = 'Second')

tigers_2025 <- tigers_first_half %>%
  bind_rows(tigers_second_half) %>%
  select(-mlbamid, -name_ascii, -def, -bs_r, -off, -g,, -hr, -rbi, -sb, -r)

tigers_munged <- tigers_2025 %>%
  pivot_longer(-c(player_id, name, half, team), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First)

tigers_2025 %>%
  select(name, w_oba, xw_oba, pa, half) %>%
  pivot_wider(names_from = half, values_from = c(w_oba, xw_oba, pa)) %>%
  mutate(pa_first_perc = pa_First / sum(pa_First, na.rm = TRUE),
         pa_second_perc = pa_Second / sum(pa_Second, na.rm = TRUE),
         woba_dif = w_oba_Second - w_oba_First,
         pa_perc_dif = pa_second_perc - pa_first_perc,
         dif_exp = woba_dif * pa_Second) %>%
  arrange(desc(dif_exp)) %>%
  select(name, w_oba_First, xw_oba_First, pa_First, pa_first_perc, w_oba_Second, xw_oba_Second, pa_Second, pa_second_perc, woba_dif, pa_perc_dif, dif_exp) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(w_oba_First, w_oba_Second, woba_dif, xw_oba_First, xw_oba_Second, dif_exp), decimals = 3) %>%
  fmt_percent(columns = c(pa_first_perc, pa_second_perc, pa_perc_dif), decimals = 1) %>%
  tab_spanner(
    label = "First Half",
    columns = c(w_oba_First, xw_oba_First, pa_First, pa_first_perc)
  ) %>%
  gt_add_divider(columns = "pa_first_perc", weight = "0.8px") %>%
  tab_spanner(
    label = "Second Half",
    columns = c(w_oba_Second, xw_oba_Second, pa_Second, pa_second_perc)
  ) %>%
  gt_add_divider(columns = "pa_second_perc", weight = "0.8px") %>%
  tab_spanner(
    label = "Dif",
    columns = c(woba_dif, pa_perc_dif, dif_exp)
  ) %>%
  cols_label(w_oba_First = 'wOBA',
             pa_First = 'PA',
             pa_first_perc = '% of PA',
             w_oba_Second = 'wOBA',
             pa_Second = 'PA',
             pa_second_perc = '% of PA',
             woba_dif = 'wOBA Delta',
             pa_perc_dif = '% of PA Delta',
             dif_exp = 'wOBA Delta * Second Half PA') %>%
  sub_missing(
    columns = everything(),
    missing_text = "-"
  )

name	First Half				Second Half				Dif
name	wOBA	xw_oba_First	PA	% of PA	wOBA	xw_oba_Second	PA	% of PA	wOBA Delta	% of PA Delta	wOBA Delta * Second Half PA
Parker Meadows	0.245	0.295	119	3.2%	0.316	0.306	94	3.9%	0.071	0.7%	6.682
Dillon Dingler	0.312	0.344	265	7.2%	0.344	0.381	204	8.4%	0.032	1.2%	6.459
Matt Vierling	0.240	0.314	47	1.3%	0.312	0.363	53	2.2%	0.073	0.9%	3.849
Andy Ibáñez	0.275	0.309	106	2.9%	0.306	0.312	87	3.6%	0.031	0.7%	2.720
Jake Rogers	0.255	0.283	79	2.1%	0.289	0.270	63	2.6%	0.034	0.5%	2.117
Kerry Carpenter	0.331	0.344	280	7.6%	0.340	0.352	184	7.6%	0.009	0.0%	1.634
Justyn-Henry Malloy	0.289	0.294	118	3.2%	0.453	0.321	9	0.4%	0.164	−2.8%	1.474
Jace Jung	0.188	0.255	53	1.4%	0.000	0.015	2	0.1%	−0.188	−1.4%	−0.376
Jahmai Jones	0.409	0.442	48	1.3%	0.394	0.391	102	4.2%	−0.015	2.9%	−1.488
Colt Keith	0.339	0.370	295	8.0%	0.302	0.318	173	7.2%	−0.037	−0.9%	−6.475
Trey Sweeney	0.265	0.266	249	6.8%	0.179	0.196	77	3.2%	−0.085	−3.6%	−6.560
Spencer Torkelson	0.350	0.363	383	10.4%	0.323	0.321	266	11.0%	−0.027	0.6%	−7.119
Wenceel Pérez	0.350	0.340	142	3.9%	0.299	0.308	241	10.0%	−0.051	6.1%	−12.342
Zach McKinstry	0.359	0.323	331	9.0%	0.286	0.280	180	7.5%	−0.073	−1.5%	−13.211
Javier Báez	0.325	0.298	284	7.7%	0.235	0.237	153	6.3%	−0.090	−1.4%	−13.705
Gleyber Torres	0.358	0.393	359	9.8%	0.296	0.341	269	11.1%	−0.062	1.4%	−16.651
Riley Greene	0.372	0.357	397	10.8%	0.299	0.333	258	10.7%	−0.073	−0.1%	−18.876
Brewer Hicklen	0.614	0.381	4	0.1%	-	-	-	-	-	-	-
Ryan Kreidler	0.150	0.241	44	1.2%	-	-	-	-	-	-	-
Tomás Nido	0.314	0.270	37	1.0%	-	-	-	-	-	-	-
Manuel Margot	0.265	0.303	20	0.5%	-	-	-	-	-	-	-
Akil Baddoo	0.157	0.271	18	0.5%	-	-	-	-	-	-	-

Giving Perez increased playing time while his production fell wasn’t helpful. With injuries to Vierling and Keith one can see how this happened, but McKinstry and Baez regressed sharply, and Greene and Torres weren’t even league average hitters despite having significant playing time. The latter two had solid xwOBAs, so there’s hope it was just a bad stretch.

On the other side, Parker Meadows had the most impactful improvement. His .316 wOBA was just above league average, so it was good but speaks more to his woeful first half.

It’s crucial to note that while there were several players who appeared in the first half but not the second, there are none who appeared only in the second half. The Tigers didn’t trade for any bats at the deadline, nor did they call up any rookies. They weren’t adverse to adding players - they used approximately 137 different relievers in September - but didn’t do anything with position players.

Since he had the largest negative impact, let’s look at Greene in more depth. His second-half wOBA was .316,

Show the code

tigers_2025 %>%
  filter(name == 'Riley Greene') %>%
  select(-team, -player_id, -war) %>%
  pivot_longer(-c(name, half), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First,
         metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric)) %>%
  select(-name) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('PA', 'wRC+'),
             decimals = 0) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('EV'),
             decimals = 1) %>%
  fmt_percent(columns = c(First, Second, dif),
             rows = metric %in% c('BB %', 'K %', 'Barrel %'),
             decimals = 1) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('AVG', 'OBP', 'SLG', 'ISO', 'BABIP', 'wOBA', 'xwOBA'),
             decimals = 3)

metric	First	Second	dif
PA	397	258	−139
BB %	6.8%	7.4%	0.6%
K %	31.5%	29.5%	−2.0%
ISO	0.260	0.197	−0.063
BABIP	0.365	0.262	−0.104
AVG	0.284	0.218	−0.066
OBP	0.335	0.279	−0.056
SLG	0.544	0.415	−0.129
wOBA	0.372	0.299	−0.073
xwOBA	0.357	0.333	−0.024
wRC+	141	90	−50
EV	90.3	89.3	−1.0
Barrel %	18.1%	15.5%	−2.6%

I was somewhat surprised to see that his walk rate actually improved slightly second half, along with his strikeout rate. BABIP fell sharply, and he won’t stay at .262 forever. His EV and Barrel rates fell slightly, which match the eye test which indicate he wasn’t making quite as solid contact. He’s talented enough where they can help him tweak things to get back on track.

Let’s look at Wenceel Pérez through the same lens.

Show the code

tigers_2025 %>%
  filter(name == 'Wenceel Pérez') %>%
  select(-team, -player_id, -war) %>%
  pivot_longer(-c(name, half), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First,
         metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric)) %>%
  select(-name) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('PA', 'wRC+'),
             decimals = 0) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('EV'),
             decimals = 1) %>%
  fmt_percent(columns = c(First, Second, dif),
             rows = metric %in% c('BB %', 'K %', 'Barrel %'),
             decimals = 1) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('AVG', 'OBP', 'SLG', 'ISO', 'BABIP', 'wOBA', 'xwOBA'),
             decimals = 3)

metric	First	Second	dif
PA	142	241	99
BB %	6.3%	9.1%	2.8%
K %	21.1%	23.7%	2.5%
ISO	0.246	0.150	−0.097
BABIP	0.287	0.284	−0.003
AVG	0.262	0.234	−0.028
OBP	0.317	0.303	−0.014
SLG	0.508	0.383	−0.125
wOBA	0.350	0.299	−0.051
xwOBA	0.340	0.308	−0.033
wRC+	126	90	−35
EV	90.1	90.4	0.3
Barrel %	12.9%	6.2%	−6.7%

We see upticks in walk and strikeout rate, but what really jumps out is SLG. A drop of 0.125 is pretty big, and while his exit velocity suggests he was still hitting the ball had, the barrel rate shows he wasn’t squaring up quite as much.

Seeing the whole team have negative regression all at once wasn’t fun, but it’s just the price we pay* for so many overperforming in the first half. But we can also see that, while every happy batter is alike, each unhappy batter is unhappy in his own way, and looking at individual results in detail is more painstaking but ultimately more insightful than aggregates.

*Destiny is calling me. C’mon, we’re all singing it now.