Ohtani and Hypothesis Testing

I was listening to Joe Posnanski’s podcast the other day, and one of his guest made an offhand comment about Ohtani not hitting well the day after he pitches. I generally don’t follow west coast teams that closely, so I wasn’t sure if there was something to this or not*. Since getting game logs is pretty straightforward, this seemed simple enough to test.

*One of the hazards of being an analyst is contanstly asking if statements like this are true or not. Makes us fun at parties.

First, we’ll compare all times where Ohtani played the day after pitching. Then, if there’s a difference, we’ll do some stats to see if that difference means anything.

Show the code

library(hrbrthemes)
library(janitor)
library(gt)
library(gtExtras)
library(tidyverse)

ohtani_2025 <- read_csv('ohtani.csv')

ohtanti_day_after_batting <- ohtani_2025 %>%
  group_by(is_day_after_pitching) %>%
  summarise(G = n(),
            PA = sum(pa),
            w_oba = weighted.mean(w_oba, pa),
            avg = weighted.mean(avg, ab),
            obp = weighted.mean(obp, pa),
            slg = weighted.mean(slg, ab),
            k_percent = weighted.mean(k_percent, ab),
            bb_percent = weighted.mean(bb_percent, ab)
  )

ohtani_long <- ohtanti_day_after_batting %>%
  mutate(is_day_after_pitching = ifelse(is_day_after_pitching == 1, 'pitched_prior_day', 'all_other_games')) %>%
  pivot_longer(-is_day_after_pitching, names_to = 'metric', values_to = 'value') %>%
  pivot_wider(names_from = is_day_after_pitching, values_from = value) 

ohtani_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('G', 'PA'),
             decimals = 0) %>%
  fmt_number(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('wOBA', 'AVG', 'OBP', 'SLG'),
             decimals = 3) %>%
  fmt_percent(columns = c(all_other_games, pitched_prior_day),
             rows = metric %in% c('K%', 'BB%'),
             decimals = 1) %>%
  cols_label(all_other_games = 'All Other Games',
             pitched_prior_day = 'Pitched Prior Day')

metric	All Other Games	Pitched Prior Day
G	150	8
PA	690	37
wOBA	0.431	0.249
AVG	0.289	0.147
OBP	0.402	0.216
SLG	0.636	0.382
K%	26.4%	32.9%
BB%	12.3%	3.5%

Well, he’s certainly unperformed the day after pitching. Hit power and on-base numbers drop, and his wOBA would be last among qualified hitters if he played that way regularly. Of course, he’s Shohei Ohtani, so we wouldn’t expect that of him regularly. And we note this is 8 games and 37 plate appearances, so it’s a very small sample. Small enough that we can look game by game.

Show the code

ohtani_2025 %>%
  filter(is_day_after_pitching == 1) %>%
  select(date, pa, w_oba, avg, obp, slg, k_percent, bb_percent) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(pa), decimals = 0) %>%
  fmt_number(columns = c(w_oba, avg, obp, slg), decimals = 3) %>%
  fmt_percent(columns = c(k_percent, bb_percent), decimals = 1) %>%
  cols_label(w_oba = 'wOBA',
             k_percent = 'K%',
             bb_percent = 'BB%')

date	pa	wOBA	avg	obp	slg	K%	BB%
2025-06-17	5	0.144	0.000	0.200	0.000	80.0%	0.0%
2025-06-29	4	0.000	0.000	0.000	0.000	25.0%	0.0%
2025-07-06	4	0.000	0.000	0.000	0.000	25.0%	0.0%
2025-07-13	5	0.393	0.333	0.600	0.333	0.0%	40.0%
2025-07-22	5	0.407	0.200	0.200	0.800	40.0%	0.0%
2025-09-06	5	0.176	0.200	0.200	0.200	20.0%	0.0%
2025-09-17	4	0.509	0.250	0.250	1.000	25.0%	0.0%
2025-09-24	5	0.317	0.200	0.200	0.600	40.0%	0.0%

We see early on that he didn’t do anything early in the season, which makes sense - he was coming back from injury, and he is only human. He also had some good games later on as he got acclimated to pitching again.

This still leaves us with the question of how expected this is. We can’t compare him to other pitchers who hit regularly, because he’s entirely singular in the game. What we can do is compare him to himself. To do this, we’ll take his 8 games here and ask “how would this compare to any randomly selected set of 8 games”?

Why do this? Well, we’re only looking at these games because he pitched the day prior. We don’t know how it would compare to any other set of 8 games. So, we’ll select 8 games at random, calculate his stats for those games, then put those games back and pick another 8 at random. We’ll do this 10,000 times.

As a rather important aside - we’re looking at whether his performance in these games was different than his normal range, not his talent. If it was the latter, we’d use a Bayesian posterior, i.e. adding 100 PA at .418 wOBA to his 37 PA and .249 wOBA to get a .372 wOBA, a number that would rank #19 in the game. That’s not our question, though, but it’s important to be clear up front what we’re trying to do.

OK, back to business. Let’s look at the first 10 sims to see how it works.

Show the code

set.seed(10312925)

ohtani_2025_permute_day_after <- bind_rows(replicate(10000,
                                                     sample_n(ohtani_2025, 8, replace = TRUE), 
                                                     simplify = FALSE),
                                           .id = 'permutation_id')

ohtani_2025_permute_day_after_stats <-ohtani_2025_permute_day_after %>%
  group_by(permutation_id) %>%
  summarise(G = n(),
            PA = sum(pa),
            w_oba = weighted.mean(w_oba, pa),
            avg = weighted.mean(avg, ab),
            obp = weighted.mean(obp, pa),
            slg = weighted.mean(slg, ab),
            k_percent = weighted.mean(k_percent, ab),
            bb_percent = weighted.mean(bb_percent, ab)) %>%
  mutate(permutation_id = as.integer(permutation_id)) %>%
  arrange(permutation_id) 

ohtani_2025_permute_day_after_stats %>%
  select(-G) %>%
  head(10) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c(PA), decimals = 0) %>%
  fmt_number(columns = c(w_oba, avg, obp, slg), decimals = 3) %>%
  fmt_percent(columns = c(k_percent, bb_percent), decimals = 1) %>%
  cols_label(w_oba = 'wOBA',
             k_percent = 'K%',
             bb_percent = 'BB%')

permutation_id	PA	wOBA	avg	obp	slg	K%	BB%
1	39	0.472	0.364	0.462	0.697	23.6%	13.2%
2	36	0.454	0.323	0.417	0.677	30.8%	11.9%
3	40	0.551	0.389	0.450	0.889	17.2%	9.0%
4	37	0.297	0.194	0.311	0.355	34.8%	8.4%
5	40	0.472	0.364	0.469	0.667	15.7%	10.9%
6	36	0.431	0.323	0.417	0.645	34.7%	12.7%
7	38	0.374	0.294	0.342	0.588	27.6%	5.9%
8	36	0.433	0.273	0.333	0.727	35.2%	4.8%
9	39	0.474	0.294	0.359	0.824	36.7%	7.1%
10	38	0.500	0.300	0.434	0.833	16.9%	15.6%

These all look pretty strong, though sim 4 is a mere .297 wOBA. Let’s graph our sims - the vertical line is his performance days after pitching.

Show the code

ohtani_permute_long <- ohtani_2025_permute_day_after_stats %>%
  select(-G, -PA) %>%
  pivot_longer(-permutation_id, names_to = 'metric', values_to = 'value') %>%
  left_join(ohtani_long %>% 
              select(-all_other_games) %>% 
              rename(actual = pitched_prior_day))

ohtani_permute_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  ggplot(aes(x = value))+
  geom_density()+
  theme_ipsum()+
  facet_wrap(metric ~ ., scales = 'free')+
  geom_vline(aes(xintercept = actual))

Firstly, let’s take a moment to appreciate what we have here. By running a whole bunch of sims, even with a small sample, we have approximately normal distributions. The actual performance lines are near the end for everything except K%, which is high but still within what we’d expect.

The charts tell most of what we need, but let’s look at the numbers while we’re here. We’ll also include what percent of sims were below the actual numbers to get a sense for how likely they are.

Show the code

ohtani_permute_long %>%
  mutate(metric = case_when(metric == 'k_percent' ~ 'K%',
                            metric == 'bb_percent' ~ 'BB%',
                            metric == 'w_oba' ~ 'wOBA',
                            metric == 'avg' ~ 'AVG',
                            metric == 'obp' ~ 'OBP',
                            metric == 'slg' ~ 'SLG',
                            TRUE ~ metric)) %>%
  mutate(is_below_actual = ifelse(value <= actual, 1, 0)) %>%
  group_by(metric) %>%
  summarise(mean_value = mean(value),
            lower = quantile(value, .025),
            upper = quantile(value, .975),
            actual = mean(actual),
            perc_below_actual = mean(is_below_actual)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(rows = metric %in% c('wOBA', 'AVG', 'OBP', 'SLG'),
             decimals = 3) %>%
  fmt_percent(rows = metric %in% c('K%', 'BB%'),
              decimals = 1) %>%
  fmt_percent(columns = perc_below_actual, decimals = 1) %>%
  tab_spanner(label = 'Sim Values',
              columns = c(mean_value, lower, upper)) %>%
  cols_label(mean_value = 'Mean',
             lower = 'Lower (2.5%)',
             upper = 'Upper (97.5%)',
             perc_below_actual = 'Percent Below Actual')

metric	Sim Values			actual	Percent Below Actual
metric	Mean	Lower (2.5%)	Upper (97.5%)	actual	Percent Below Actual
AVG	0.281	0.138	0.433	0.147	3.3%
BB%	12.1%	3.5%	22.9%	3.5%	2.6%
K%	26.7%	13.6%	41.3%	32.9%	81.3%
OBP	0.390	0.235	0.543	0.216	1.6%
SLG	0.619	0.250	1.033	0.382	12.0%
wOBA	0.419	0.237	0.613	0.249	3.5%

Basically everything here is on the low end. Strikeout rate is within range, SLG may be within bounds but it’s still definitely lower. Combining this with our knowledge that pitching takes a toll on the body, we can conclude that hitting the day after did affect Ohtani’s hitting. He’ll still win an MVP this year, so it’s not like it was all that detrimental.

We’ll also note he went 0 for 4 in game 5 after pitching in game 4. But game 4 came after game 3*, in which he had 2 home runs, 2 doubles, and 5 walks and which lasted 15 innings, and his back was no doubt sore after carrying the Dodger’s offense for the series.

*This is the sort of hard-hitting analysis I love to provide.