Ohtani and Hypothesis Testing

statistics
r
mlb
probability
Author

Mark Jurries II

Published

October 28, 2025

I was listening to Joe Posnanski’s podcast the other day, and one of his guest made an offhand comment about Ohtani not hitting well the day after he pitches. I generally don’t follow west coast teams that closely, so I wasn’t sure if there was something to this or not*. Since getting game logs is pretty straightforward, this seemed simple enough to test.

*One of the hazards of being an analyst is contanstly asking if statements like this are true or not. Makes us fun at parties.

First, we’ll compare all times where Ohtani played the day after pitching. Then, if there’s a difference, we’ll do some stats to see if that difference means anything.

Show the code
library(hrbrthemes)
library(janitor)
library(gt)
library(gtExtras)
library(tidyverse)

ohtani_2025 <- read_csv('ohtani.csv')

ohtanti_day_after_batting <- ohtani_2025 %>%
  group_by(is_day_after_pitching) %>%
  summarise(G = n(),
            PA = sum(pa),
            w_oba = weighted.mean(w_oba, pa),
            avg = weighted.mean(avg, ab),
            obp = weighted.mean(obp, pa),
            slg = weighted.mean(slg, ab),
            k_percent = weighted.mean(k_percent, ab),
            bb_percent = weighted.mean(bb_percent, ab)
  )

ohtani_long <- ohtanti_day_after_batting %>%
  mutate(is_day_after_pitching = ifelse(is_day_after_pitching == 1, 'pitched_prior_day', 'all_other_games')) %>%
  pivot_longer(-is_day_after_pitching, names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = is_day_after_pitching, values_from = value) 

ohtani_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('G', 'PA'),
             decimals = 0) %>%
  fmt_number(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('wOBA', 'AVG', 'OBP', 'SLG'),
             decimals = 3) %>%
  fmt_percent(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('K%', 'BB%'),
             decimals = 1) %>%
  cols_label(all_other_games = 'All Other Games',
             pitched_prior_day = 'Pitched Prior Day')
metric All Other Games Pitched Prior Day
G 150 8
PA 690 37
wOBA 0.431 0.249
AVG 0.289 0.147
OBP 0.402 0.216
SLG 0.636 0.382
K% 26.4% 32.9%
BB% 12.3% 3.5%

Well, he’s certainly unperformed the day after pitching. Hit power and on-base numbers drop, and his wOBA would be last among qualified hitters if he played that way regularly. Of course, he’s Shohei Ohtani, so we wouldn’t expect that of him regularly. And we note this is 8 games and 37 plate appearances, so it’s a very small sample. Small enough that we can look game by game.

Show the code
ohtani_2025 %>%
  filter(is_day_after_pitching == 1) %>%
  select(date, pa, w_oba, avg, obp, slg, k_percent, bb_percent) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(pa), decimals = 0) %>%
  fmt_number(columns = c(w_oba, avg, obp, slg), decimals = 3) %>%
  fmt_percent(columns = c(k_percent, bb_percent), decimals = 1) %>%
  cols_label(w_oba = 'wOBA',
             k_percent = 'K%',
             bb_percent = 'BB%')
date pa wOBA avg obp slg K% BB%
2025-06-17 5 0.144 0.000 0.200 0.000 80.0% 0.0%
2025-06-29 4 0.000 0.000 0.000 0.000 25.0% 0.0%
2025-07-06 4 0.000 0.000 0.000 0.000 25.0% 0.0%
2025-07-13 5 0.393 0.333 0.600 0.333 0.0% 40.0%
2025-07-22 5 0.407 0.200 0.200 0.800 40.0% 0.0%
2025-09-06 5 0.176 0.200 0.200 0.200 20.0% 0.0%
2025-09-17 4 0.509 0.250 0.250 1.000 25.0% 0.0%
2025-09-24 5 0.317 0.200 0.200 0.600 40.0% 0.0%

We see early on that he didn’t do anything early in the season, which makes sense - he was coming back from injury, and he is only human. He also had some good games later on as he got acclimated to pitching again.

This still leaves us with the question of how expected this is. We can’t compare him to other pitchers who hit regularly, because he’s entirely singular in the game. What we can do is compare him to himself. To do this, we’ll take his 8 games here and ask “how would this compare to any randomly selected set of 8 games”?

Why do this? Well, we’re only looking at these games because he pitched the day prior. We don’t know how it would compare to any other set of 8 games. So, we’ll select 8 games at random, calculate his stats for those games, then put those games back and pick another 8 at random. We’ll do this 10,000 times.

As a rather important aside - we’re looking at whether his performance in these games was different than his normal range, not his talent. If it was the latter, we’d use a Bayesian posterior, i.e. adding 100 PA at .418 wOBA to his 37 PA and .249 wOBA to get a .372 wOBA, a number that would rank #19 in the game. That’s not our question, though, but it’s important to be clear up front what we’re trying to do.

OK, back to business. Let’s look at the first 10 sims to see how it works.

Show the code
set.seed(10312925)

ohtani_2025_permute_day_after <- bind_rows(replicate(10000,
                                                     sample_n(ohtani_2025, 8, replace = TRUE), 
                                                     simplify = FALSE),
                                           .id = 'permutation_id')

ohtani_2025_permute_day_after_stats <-ohtani_2025_permute_day_after %>%
  group_by(permutation_id) %>%
  summarise(G = n(),
            PA = sum(pa),
            w_oba = weighted.mean(w_oba, pa),
            avg = weighted.mean(avg, ab),
            obp = weighted.mean(obp, pa),
            slg = weighted.mean(slg, ab),
            k_percent = weighted.mean(k_percent, ab),
            bb_percent = weighted.mean(bb_percent, ab)) %>%
  mutate(permutation_id = as.integer(permutation_id)) %>%
  arrange(permutation_id) 

ohtani_2025_permute_day_after_stats %>%
  select(-G) %>%
  head(10) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(PA), decimals = 0) %>%
  fmt_number(columns = c(w_oba, avg, obp, slg), decimals = 3) %>%
  fmt_percent(columns = c(k_percent, bb_percent), decimals = 1) %>%
  cols_label(w_oba = 'wOBA',
             k_percent = 'K%',
             bb_percent = 'BB%')
permutation_id PA wOBA avg obp slg K% BB%
1 39 0.472 0.364 0.462 0.697 23.6% 13.2%
2 36 0.454 0.323 0.417 0.677 30.8% 11.9%
3 40 0.551 0.389 0.450 0.889 17.2% 9.0%
4 37 0.297 0.194 0.311 0.355 34.8% 8.4%
5 40 0.472 0.364 0.469 0.667 15.7% 10.9%
6 36 0.431 0.323 0.417 0.645 34.7% 12.7%
7 38 0.374 0.294 0.342 0.588 27.6% 5.9%
8 36 0.433 0.273 0.333 0.727 35.2% 4.8%
9 39 0.474 0.294 0.359 0.824 36.7% 7.1%
10 38 0.500 0.300 0.434 0.833 16.9% 15.6%

These all look pretty strong, though sim 4 is a mere .297 wOBA. Let’s graph our sims - the vertical line is his performance days after pitching.

Show the code
ohtani_permute_long <- ohtani_2025_permute_day_after_stats %>%
  select(-G, -PA) %>%
  pivot_longer(-permutation_id, names_to = 'metric', values_to = 'value') %>%
  left_join(ohtani_long %>% 
              select(-all_other_games) %>% 
              rename(actual = pitched_prior_day))

ohtani_permute_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  ggplot(aes(x = value))+
  geom_density()+
  theme_ipsum()+
  facet_wrap(metric ~ ., scales = 'free')+
  geom_vline(aes(xintercept = actual))

Firstly, let’s take a moment to appreciate what we have here. By running a whole bunch of sims, even with a small sample, we have approximately normal distributions. The actual performance lines are near the end for everything except K%, which is high but still within what we’d expect.

The charts tell most of what we need, but let’s look at the numbers while we’re here. We’ll also include what percent of sims were below the actual numbers to get a sense for how likely they are.

Show the code
ohtani_permute_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  mutate(is_below_actual = ifelse(value <= actual, 1, 0)) %>%
  group_by(metric) %>%
  summarise(mean_value = mean(value),
            lower = quantile(value, .025),
            upper = quantile(value, .975),
            actual = mean(actual),
            perc_below_actual = mean(is_below_actual)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(rows = metric %in% c('wOBA', 'AVG', 'OBP', 'SLG'),
             decimals = 3) %>%
  fmt_percent(rows = metric %in% c('K%', 'BB%'),
              decimals = 1) %>%
  fmt_percent(columns = perc_below_actual, decimals = 1) %>%
  tab_spanner(label = 'Sim Values',
              columns = c(mean_value, lower, upper)) %>%
  cols_label(mean_value = 'Mean',
             lower = 'Lower (2.5%)',
             upper = 'Upper (97.5%)',
             perc_below_actual = 'Percent Below Actual')
metric
Sim Values
actual Percent Below Actual
Mean Lower (2.5%) Upper (97.5%)
AVG 0.281 0.138 0.433 0.147 3.3%
BB% 12.1% 3.5% 22.9% 3.5% 2.6%
K% 26.7% 13.6% 41.3% 32.9% 81.3%
OBP 0.390 0.235 0.543 0.216 1.6%
SLG 0.619 0.250 1.033 0.382 12.0%
wOBA 0.419 0.237 0.613 0.249 3.5%

Basically everything here is on the low end. Strikeout rate is within range, SLG may be within bounds but it’s still definitely lower. Combining this with our knowledge that pitching takes a toll on the body, we can conclude that hitting the day after did affect Ohtani’s hitting. He’ll still win an MVP this year, so it’s not like it was all that detrimental.

We’ll also note he went 0 for 4 in game 5 after pitching in game 4. But game 4 came after game 3*, in which he had 2 home runs, 2 doubles, and 5 walks and which lasted 15 innings, and his back was no doubt sore after carrying the Dodger’s offense for the series.

*This is the sort of hard-hitting analysis I love to provide.