Batting Halves and Regression

statistics
r
mlb
probability
Author

Mark Jurries II

Published

October 6, 2025

The Tigers started the first half* of 2025 strong with a .325 team wOBA, only to see that drop to .309 in the second half. We can look at which batters changed, but we need to lay some groundwork first before we dig in.

*I’m using the “pre and post All-Star game” definition, which results in unequal halves. The literally-minded among us would like to point out that they are no longer halves in this case, but must also accept that colloqiualsims are a necessary part of language and move on with life.

Firstly, we want to look at wOBA since it shows a batter’s overall contribution. And we want to look at their performance, which is not the same as their skill. Gleyber Torres had a second half wOBA of .296 in 269 plate appearances. He has a career .333 wOBA in 4,301 PA. So when we look at his second half, we don’t necessarily think he’s suddenly turned into Don Kelly. It’s possible, and we can look deeper into his underlying stats to see what’s up. But since we’re looking purely at descriptive statistics, we’re only interested in his performance, not necessarily his talent.

We’re also not testing to see if the difference between halves is statistically significant. Given the sample sizes it’s unlikely, but not impossible if the delta is large enough. But again, we’re interested in the story here. It may prompt further exploration, but that’s not the main intent.

Also, crucially, we’re not saying that the first half is a “baseline”. Some players had first halves that were better than we’d have expected and they returned to normal in the second half. Some went the other way. It’s easy to get anchored to “first number = normal”, but let’s instead think “first number happened, may or may not be normal*”.

*Say this at a sports bar and see what happens.

Next, it might be helpful first to get a feel for how first and second halves generally correlate so we have context for the broader league. Using data from Fangraphs, we’ll look at the first and second half correlations for a number of metrics. There are a lot of hitting metrics, so we’ll stick with the defaults on FG, adding in Barrel % and average Exit Velocity. We’ll look only at hitters who qualified in both halves, which leaves us with 97 hitters. Not super robust, but not bad, either. Let’s plot the halves against each other.

Show the code
library(hrbrthemes)
library(janitor)
library(gt)
library(gtExtras)
library(tidyverse)

mlb_first_half <- read_csv('mlb_2025_first_half.csv') %>%
  clean_names() %>%
  mutate(half = 'First')

mlb_second_half <- read_csv('mlb_2025_second_half.csv') %>%
  clean_names() %>%
  mutate(half = 'Second')

mlb_2025 <- mlb_first_half %>%
  bind_rows(mlb_second_half) %>%
  select(-mlbamid, -name_ascii, -def, -bs_r, -off, -g, -pa, -hr, -rbi, -sb, -r, -war)

mlb_munged <- mlb_2025 %>%
  pivot_longer(-c(player_id, name, half, team), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  drop_na() %>%
  mutate(dif = Second - First) %>%
  mutate(metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric))

mlb_munged %>%
  ggplot(aes(x = First, y = Second))+
  geom_point()+
  stat_smooth(method = "lm")+
  facet_wrap(metric ~ ., scales = 'free')+
  theme_ipsum()

Nothing too surprising here - BABIP is fickle, walk and strikeout rates are fairly stable, as is batted ball data.

Show the code
mlb_munged %>%
  group_by(metric) %>%
  summarise(correlation = cor(First, Second),
            rsq = correlation^2,
            ) %>%
  arrange(desc(rsq)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(correlation, rsq),
             decimals = 2)
metric correlation rsq
K % 0.86 0.73
Barrel % 0.79 0.62
EV 0.78 0.60
BB % 0.70 0.48
xwOBA 0.59 0.34
ISO 0.58 0.34
OBP 0.39 0.15
SLG 0.33 0.11
wRC+ 0.31 0.09
wOBA 0.27 0.07
AVG 0.16 0.03
BABIP 0.14 0.02

So, for the players selected, 73% of the variance in second-half strikeout rate can be explained by first-half strikeout rate. So if a guy strikes out a lot in the first half, don’t bet on him changing in the second. BABIP, on the other hand, is basically random, moreso when we split into small samples like this.

So we have an idea of what to expect., and we know wOBA can be volitile across halves, at least compared to other metrics. But what we really want is to know the driving forces behind the Tigers drop in wOBA. Let’s look at that for each player, along with their PA, % of total team PA, and xwOBA (which is calculated based on exit velocity, batted ball type, runner speed, and a few other variables) and see if anything jumps out.

One more note - since we don’t want players with 5 PA on top of the chart, we’ll sort by the wOBA difference * second half plate appearances. This doesn’t translate directly into runs since wOBA includes a multiplier to scale similarly to OBP, but will give us a ranking of who had the highest impact between change and PA.

Show the code
tigers_first_half <- read_csv('tigers_2025_first_half.csv') %>%
  clean_names() %>%
  mutate(half = 'First')

tigers_second_half <- read_csv('tigers_2025_second_half.csv') %>%
  clean_names() %>%
  mutate(half = 'Second')

tigers_2025 <- tigers_first_half %>%
  bind_rows(tigers_second_half) %>%
  select(-mlbamid, -name_ascii, -def, -bs_r, -off, -g,, -hr, -rbi, -sb, -r)

tigers_munged <- tigers_2025 %>%
  pivot_longer(-c(player_id, name, half, team), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First)

tigers_2025 %>%
  select(name, w_oba, xw_oba, pa, half) %>%
  pivot_wider(names_from = half, values_from = c(w_oba, xw_oba, pa)) %>%
  mutate(pa_first_perc = pa_First / sum(pa_First, na.rm = TRUE),
         pa_second_perc = pa_Second / sum(pa_Second, na.rm = TRUE),
         woba_dif = w_oba_Second - w_oba_First,
         pa_perc_dif = pa_second_perc - pa_first_perc,
         dif_exp = woba_dif * pa_Second) %>%
  arrange(desc(dif_exp)) %>%
  select(name, w_oba_First, xw_oba_First, pa_First, pa_first_perc, w_oba_Second, xw_oba_Second, pa_Second, pa_second_perc, woba_dif, pa_perc_dif, dif_exp) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(w_oba_First, w_oba_Second, woba_dif, xw_oba_First, xw_oba_Second, dif_exp), decimals = 3) %>%
  fmt_percent(columns = c(pa_first_perc, pa_second_perc, pa_perc_dif), decimals = 1) %>%
  tab_spanner(
    label = "First Half",
    columns = c(w_oba_First, xw_oba_First, pa_First, pa_first_perc)
  ) %>%
  gt_add_divider(columns = "pa_first_perc", weight = "0.8px") %>%
  tab_spanner(
    label = "Second Half",
    columns = c(w_oba_Second, xw_oba_Second, pa_Second, pa_second_perc)
  ) %>%
  gt_add_divider(columns = "pa_second_perc", weight = "0.8px") %>%
  tab_spanner(
    label = "Dif",
    columns = c(woba_dif, pa_perc_dif, dif_exp)
  ) %>%
  cols_label(w_oba_First = 'wOBA',
             pa_First = 'PA',
             pa_first_perc = '% of PA',
             w_oba_Second = 'wOBA',
             pa_Second = 'PA',
             pa_second_perc = '% of PA',
             woba_dif = 'wOBA Delta',
             pa_perc_dif = '% of PA Delta',
             dif_exp = 'wOBA Delta * Second Half PA') %>%
  sub_missing(
    columns = everything(),
    missing_text = "-"
  )
name
First Half
Second Half
Dif
wOBA xw_oba_First PA % of PA wOBA xw_oba_Second PA % of PA wOBA Delta % of PA Delta wOBA Delta * Second Half PA
Parker Meadows 0.245 0.295 119 3.2% 0.316 0.306 94 3.9% 0.071 0.7% 6.682
Dillon Dingler 0.312 0.344 265 7.2% 0.344 0.381 204 8.4% 0.032 1.2% 6.459
Matt Vierling 0.240 0.314 47 1.3% 0.312 0.363 53 2.2% 0.073 0.9% 3.849
Andy Ibáñez 0.275 0.309 106 2.9% 0.306 0.312 87 3.6% 0.031 0.7% 2.720
Jake Rogers 0.255 0.283 79 2.1% 0.289 0.270 63 2.6% 0.034 0.5% 2.117
Kerry Carpenter 0.331 0.344 280 7.6% 0.340 0.352 184 7.6% 0.009 0.0% 1.634
Justyn-Henry Malloy 0.289 0.294 118 3.2% 0.453 0.321 9 0.4% 0.164 −2.8% 1.474
Jace Jung 0.188 0.255 53 1.4% 0.000 0.015 2 0.1% −0.188 −1.4% −0.376
Jahmai Jones 0.409 0.442 48 1.3% 0.394 0.391 102 4.2% −0.015 2.9% −1.488
Colt Keith 0.339 0.370 295 8.0% 0.302 0.318 173 7.2% −0.037 −0.9% −6.475
Trey Sweeney 0.265 0.266 249 6.8% 0.179 0.196 77 3.2% −0.085 −3.6% −6.560
Spencer Torkelson 0.350 0.363 383 10.4% 0.323 0.321 266 11.0% −0.027 0.6% −7.119
Wenceel Pérez 0.350 0.340 142 3.9% 0.299 0.308 241 10.0% −0.051 6.1% −12.342
Zach McKinstry 0.359 0.323 331 9.0% 0.286 0.280 180 7.5% −0.073 −1.5% −13.211
Javier Báez 0.325 0.298 284 7.7% 0.235 0.237 153 6.3% −0.090 −1.4% −13.705
Gleyber Torres 0.358 0.393 359 9.8% 0.296 0.341 269 11.1% −0.062 1.4% −16.651
Riley Greene 0.372 0.357 397 10.8% 0.299 0.333 258 10.7% −0.073 −0.1% −18.876
Brewer Hicklen 0.614 0.381 4 0.1% - - - - - - -
Ryan Kreidler 0.150 0.241 44 1.2% - - - - - - -
Tomás Nido 0.314 0.270 37 1.0% - - - - - - -
Manuel Margot 0.265 0.303 20 0.5% - - - - - - -
Akil Baddoo 0.157 0.271 18 0.5% - - - - - - -

Giving Perez increased playing time while his production fell wasn’t helpful. With injuries to Vierling and Keith one can see how this happened, but McKinstry and Baez regressed sharply, and Greene and Torres weren’t even league average hitters despite having significant playing time. The latter two had solid xwOBAs, so there’s hope it was just a bad stretch.

On the other side, Parker Meadows had the most impactful improvement. His .316 wOBA was just above league average, so it was good but speaks more to his woeful first half.

It’s crucial to note that while there were several players who appeared in the first half but not the second, there are none who appeared only in the second half. The Tigers didn’t trade for any bats at the deadline, nor did they call up any rookies. They weren’t adverse to adding players - they used approximately 137 different relievers in September - but didn’t do anything with position players.

Since he had the largest negative impact, let’s look at Greene in more depth. His second-half wOBA was .316,

Show the code
tigers_2025 %>%
  filter(name == 'Riley Greene') %>%
  select(-team, -player_id, -war) %>%
  pivot_longer(-c(name, half), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First,
         metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric)) %>%
  select(-name) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('PA', 'wRC+'),
             decimals = 0) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('EV'),
             decimals = 1) %>%
  fmt_percent(columns = c(First, Second, dif),
             rows = metric %in% c('BB %', 'K %', 'Barrel %'),
             decimals = 1) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('AVG', 'OBP', 'SLG', 'ISO', 'BABIP', 'wOBA', 'xwOBA'),
             decimals = 3)
metric First Second dif
PA 397 258 −139
BB % 6.8% 7.4% 0.6%
K % 31.5% 29.5% −2.0%
ISO 0.260 0.197 −0.063
BABIP 0.365 0.262 −0.104
AVG 0.284 0.218 −0.066
OBP 0.335 0.279 −0.056
SLG 0.544 0.415 −0.129
wOBA 0.372 0.299 −0.073
xwOBA 0.357 0.333 −0.024
wRC+ 141 90 −50
EV 90.3 89.3 −1.0
Barrel % 18.1% 15.5% −2.6%

I was somewhat surprised to see that his walk rate actually improved slightly second half, along with his strikeout rate. BABIP fell sharply, and he won’t stay at .262 forever. His EV and Barrel rates fell slightly, which match the eye test which indicate he wasn’t making quite as solid contact. He’s talented enough where they can help him tweak things to get back on track.

Let’s look at Wenceel Pérez through the same lens.

Show the code
tigers_2025 %>%
  filter(name == 'Wenceel Pérez') %>%
  select(-team, -player_id, -war) %>%
  pivot_longer(-c(name, half), names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = half, values_from = value) %>%
  mutate(dif = Second - First,
         metric = toupper(metric),
         metric = case_when(metric == 'BB_PERCENT' ~ 'BB %',
                           metric == 'K_PERCENT' ~ 'K %',
                           metric == 'BARREL_PERCENT' ~ 'Barrel %',
                           metric == 'W_RC' ~ 'wRC+',
                           metric == 'W_OBA' ~ 'wOBA',
                           metric == 'XW_OBA' ~ 'xwOBA',
                           TRUE ~ metric)) %>%
  select(-name) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('PA', 'wRC+'),
             decimals = 0) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('EV'),
             decimals = 1) %>%
  fmt_percent(columns = c(First, Second, dif),
             rows = metric %in% c('BB %', 'K %', 'Barrel %'),
             decimals = 1) %>%
  fmt_number(columns = c(First, Second, dif),
             rows = metric %in% c('AVG', 'OBP', 'SLG', 'ISO', 'BABIP', 'wOBA', 'xwOBA'),
             decimals = 3)
metric First Second dif
PA 142 241 99
BB % 6.3% 9.1% 2.8%
K % 21.1% 23.7% 2.5%
ISO 0.246 0.150 −0.097
BABIP 0.287 0.284 −0.003
AVG 0.262 0.234 −0.028
OBP 0.317 0.303 −0.014
SLG 0.508 0.383 −0.125
wOBA 0.350 0.299 −0.051
xwOBA 0.340 0.308 −0.033
wRC+ 126 90 −35
EV 90.1 90.4 0.3
Barrel % 12.9% 6.2% −6.7%

We see upticks in walk and strikeout rate, but what really jumps out is SLG. A drop of 0.125 is pretty big, and while his exit velocity suggests he was still hitting the ball had, the barrel rate shows he wasn’t squaring up quite as much.

Seeing the whole team have negative regression all at once wasn’t fun, but it’s just the price we pay* for so many overperforming in the first half. But we can also see that, while every happy batter is alike, each unhappy batter is unhappy in his own way, and looking at individual results in detail is more painstaking but ultimately more insightful than aggregates.

*Destiny is calling me. C’mon, we’re all singing it now.