Show the code
set.seed(2000)
<- rbinom(10000, 100, .5)
simmed_flips
summary(simmed_flips)
Min. 1st Qu. Median Mean 3rd Qu. Max.
29.00 47.00 50.00 49.99 53.00 68.00
Mark Jurries II
June 27, 2025
We’re at the halfway point of the MLB season, with most teams having played 81 games. This is as good a time as any to evaluate the AL Central and see the odds of the Tigers winning the division. We could just use the trusty Fangraphs projections, but it’s fun to dive into binomial simulation.
R makes it fairly straightforward to simulate binomial trials. If I wanted to flip a coin 100 times and see how often it comes up heads, I can do so with one line of code. As a bonus, we can do 10,000 sets of 100 flips*, so we can see how typical our numbers are.
Min. 1st Qu. Median Mean 3rd Qu. Max.
29.00 47.00 50.00 49.99 53.00 68.00
Roughly half of our flips are heads, which is what we’d expect. But we can get as few as 31 and as many as 61, so there’s still some variation in there. So there’s the first thing we need to account for - even things as straightforward as a coin flip can still show a lot of variation. We can use the same process for modeling a team’s chances of winning, while it’s obviously more complex than a coin flip* the principle holds.
Secondly, we need to know what win rate to feed our model. A team may have a high winning percentage, but given the previous example that may be a fluke. Same for low winning percentage. To accommodate this, we’ll take a quasi-Bayesian approach and average a team’s actual winning percentage with .500*. So they still get credit for what they’ve done, but we adjust our expectations a bit as well.
Let’s note as well that we’re simulating each team individually. That is, we’re just asking how many wins we can reasonably expect a given team to end with - we’re not modeling games against other teams, adjusting for strength of schedule, or anything like that. We also don’t know what injuries may come nor who teams may trade for.
Before that, let’s check the AL Central as of the morning of 6/27/25.
library(gt)
library(gtExtras)
library(hrbrthemes)
library(tidyverse)
al_central <- tribble(~'team', ~'w', ~'l',
'Tigers', 51, 31,
'Guardians', 40, 39,
'Twins', 39, 42,
'Royals', 38, 43,
'White Sox', 26, 55) %>%
mutate(win_perc = w / (w + l),
games_remaining = 162 - w -l,
weighted_win_perc = (.500 + win_perc) / 2)
al_central %>%
select(-weighted_win_perc) %>%
gt() %>%
gt_theme_espn() %>%
fmt_number(columns = c('win_perc'), decimals = 3)
team | w | l | win_perc | games_remaining |
---|---|---|---|---|
Tigers | 51 | 31 | 0.622 | 80 |
Guardians | 40 | 39 | 0.506 | 83 |
Twins | 39 | 42 | 0.481 | 81 |
Royals | 38 | 43 | 0.469 | 81 |
White Sox | 26 | 55 | 0.321 | 81 |
OK, that’s our baseline - these games have been played and will be accounted for in our final numbers. Let’s now simulate the rest of the season.
al_central_simmed <- al_central %>%
group_by(team) %>%
mutate(
simulations = list(rbinom(n = 10000, size = games_remaining, prob = weighted_win_perc))
) %>%
unnest(simulations) %>% # Expand the list column into new rows
ungroup() %>%
rename(value = win_perc, t = simulations) %>%
mutate(projected_wins = w + t,
ros_projected_win_perc = t / games_remaining) %>%
group_by(team) %>%
mutate(sim_id = row_number()) %>%
group_by(sim_id) %>%
mutate(ranking = rank(-projected_wins, ties.method = "min"))
al_central_simmed %>%
ggplot(aes(x = projected_wins, color = team))+
geom_density()+
theme_ipsum()+
scale_color_manual(values = c("#E31937", "#004687", "#FA4616", "#002B5C", "#27251F" ))
As a Tigers fan, this is encouraging - the average number of wins is north of 90, and far enough away from the rest of the division that we’re relatively comfortable. (Although their surge last year from 0.6% to 100% is a reminder that unexpected things happen in baseball.) If you’re a White Sox fan, well, at least it’s better than last year.
One thing this doesn’t tell us is how often the Tigers win the division. Just because they win 85 games in a sim doesn’t mean someone else wins 86, and just because Cleveland wins 93 doesn’t mean Detroit has fewer wins. What we’ll do here is number our simulations and see who has the most wins. Let’s look at our first sim as an example.
team | w | l | games_remaining | ros_projected_win_perc | projected_wins |
---|---|---|---|---|---|
Tigers | 51 | 31 | 80 | 0.588 | 98 |
Guardians | 40 | 39 | 83 | 0.627 | 92 |
Twins | 39 | 42 | 81 | 0.556 | 84 |
Royals | 38 | 43 | 81 | 0.407 | 71 |
White Sox | 26 | 55 | 81 | 0.494 | 66 |
In this example, the Tigers play .588 in their remaining 80 games, finishing with 98 wins, 6 up on the second-place Guardians, who do really well in this sim. This seems plausible, but we won’t know how likely it is. It’s a good thing we did this 10,000 times so we can average our results.
al_central_simmed %>%
group_by(team) %>%
mutate(won_division = ifelse(ranking == 1, 1, 0)) %>%
summarise(avg_projected_wins = mean(projected_wins),
projected_10th = quantile(projected_wins, .10),
projected_90th = quantile(projected_wins, .90),
won_division = mean(won_division)) %>%
arrange(desc(avg_projected_wins)) %>%
gt() %>%
gt_theme_espn() %>%
fmt_number(columns = c('avg_projected_wins'), decimals = 1) %>%
fmt_percent(columns = c('won_division'), decimals = 1)
team | avg_projected_wins | projected_10th | projected_90th | won_division |
---|---|---|---|---|
Tigers | 95.8 | 90 | 101 | 98.7% |
Guardians | 81.8 | 76 | 88 | 1.7% |
Twins | 78.8 | 73 | 84 | 0.3% |
Royals | 77.2 | 71 | 83 | 0.1% |
White Sox | 59.3 | 54 | 65 | 0.0% |
This model has the Tigers winning an average of 96 games, winning the Central in 98.7% of sims. Their 10th percentile projected wins is 90, which exceeds the 90th percentile for every other team. So, it would require a combination of some other team playing red-hot ball and the Tigers playing poorly for them to not win the division. For instance:
al_central_simmed %>%
group_by(sim_id) %>%
mutate(tigers_won = max(ifelse(ranking == 1 & team == 'Tigers', 1, 0))) %>%
filter(tigers_won == 0) %>%
ungroup() %>%
filter(sim_id == min(sim_id)) %>%
select(team, w, l, games_remaining, ros_projected_win_perc, projected_wins) %>%
arrange(desc(projected_wins)) %>%
gt() %>%
gt_theme_espn() %>%
fmt_number(columns = c('ros_projected_win_perc'), decimals = 3)
team | w | l | games_remaining | ros_projected_win_perc | projected_wins |
---|---|---|---|---|---|
Twins | 39 | 42 | 81 | 0.679 | 94 |
Tigers | 51 | 31 | 80 | 0.512 | 92 |
Guardians | 40 | 39 | 83 | 0.530 | 84 |
Royals | 38 | 43 | 81 | 0.407 | 71 |
White Sox | 26 | 55 | 81 | 0.420 | 60 |
In this sim, the Twins play .679 ball the rest of the year while the Tigers fall to .512. Could it happen? Sure. But the Twins only won in 0.3% of sims. This doesn’t mean it’s impossible - the model still says there’s a chance - but it’s also pretty unlikely. But seeing unlikely things is part of the appeal of baseball. Yet the Tigers winning the Central is far and away the most likely outcome, which means it’s going to be a fun fall.