AL Central Projections – Point Estimates

We’re at the halfway point of the MLB season, with most teams having played 81 games. This is as good a time as any to evaluate the AL Central and see the odds of the Tigers winning the division. We could just use the trusty Fangraphs projections, but it’s fun to dive into binomial simulation.

*I think it’s fun, anyway. Mileage may vary.

R makes it fairly straightforward to simulate binomial trials. If I wanted to flip a coin 100 times and see how often it comes up heads, I can do so with one line of code. As a bonus, we can do 10,000 sets of 100 flips*, so we can see how typical our numbers are.

*To validate this yourself, flip a coin one million times and record the results.

Show the code

set.seed(2000)
simmed_flips <- rbinom(10000, 100, .5)
  
summary(simmed_flips)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  29.00   47.00   50.00   49.99   53.00   68.00

Roughly half of our flips are heads, which is what we’d expect. But we can get as few as 31 and as many as 61, so there’s still some variation in there. So there’s the first thing we need to account for - even things as straightforward as a coin flip can still show a lot of variation. We can use the same process for modeling a team’s chances of winning, while it’s obviously more complex than a coin flip* the principle holds.

*You are falling into your old error, Jeeves, of thinking that the Tigers are a penny.

Secondly, we need to know what win rate to feed our model. A team may have a high winning percentage, but given the previous example that may be a fluke. Same for low winning percentage. To accommodate this, we’ll take a quasi-Bayesian approach and average a team’s actual winning percentage with .500*. So they still get credit for what they’ve done, but we adjust our expectations a bit as well.

*A better value would be preseason projections. .500 is a bit simplistic, it’s fine for understanding how the model works but if I were to build something out for all of MLB I’d get much more sophisticated. I’d also weigh by games played and games remaining, but since we’re ~50% through the season a straight average is fine.

Let’s note as well that we’re simulating each team individually. That is, we’re just asking how many wins we can reasonably expect a given team to end with - we’re not modeling games against other teams, adjusting for strength of schedule, or anything like that. We also don’t know what injuries may come nor who teams may trade for.

Before that, let’s check the AL Central as of the morning of 6/27/25.

Show the code

library(gt)
library(gtExtras)
library(hrbrthemes)
library(tidyverse)

al_central <- tribble(~'team', ~'w', ~'l',
                      'Tigers', 51, 31,
                      'Guardians', 40, 39,
                      'Twins', 39, 42,
                      'Royals', 38, 43,
                      'White Sox', 26, 55) %>%
  mutate(win_perc = w / (w + l),
         games_remaining = 162 - w -l,
         weighted_win_perc = (.500 + win_perc) / 2)

al_central %>%
  select(-weighted_win_perc) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c('win_perc'), decimals = 3)

team	w	l	win_perc	games_remaining
Tigers	51	31	0.622	80
Guardians	40	39	0.506	83
Twins	39	42	0.481	81
Royals	38	43	0.469	81
White Sox	26	55	0.321	81

OK, that’s our baseline - these games have been played and will be accounted for in our final numbers. Let’s now simulate the rest of the season.

Show the code

al_central_simmed <- al_central %>%
  group_by(team) %>%
  mutate(
    simulations = list(rbinom(n = 10000, size = games_remaining, prob = weighted_win_perc))
  ) %>%
  unnest(simulations) %>% # Expand the list column into new rows
  ungroup() %>%
  rename(value = win_perc, t = simulations) %>%
  mutate(projected_wins = w + t,
         ros_projected_win_perc = t / games_remaining) %>%
  group_by(team) %>%
  mutate(sim_id = row_number()) %>%
  group_by(sim_id) %>%
  mutate(ranking = rank(-projected_wins, ties.method = "min"))
 
al_central_simmed %>%
  ggplot(aes(x = projected_wins, color = team))+
  geom_density()+
  theme_ipsum()+
  scale_color_manual(values = c("#E31937",  "#004687", "#FA4616", "#002B5C",  "#27251F" ))

As a Tigers fan, this is encouraging - the average number of wins is north of 90, and far enough away from the rest of the division that we’re relatively comfortable. (Although their surge last year from 0.6% to 100% is a reminder that unexpected things happen in baseball.) If you’re a White Sox fan, well, at least it’s better than last year.

One thing this doesn’t tell us is how often the Tigers win the division. Just because they win 85 games in a sim doesn’t mean someone else wins 86, and just because Cleveland wins 93 doesn’t mean Detroit has fewer wins. What we’ll do here is number our simulations and see who has the most wins. Let’s look at our first sim as an example.

Show the code

al_central_simmed %>%
  ungroup() %>%
  filter(sim_id == 1) %>%
  select(team, w, l, games_remaining, ros_projected_win_perc, projected_wins) %>%
  arrange(desc(projected_wins)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c('ros_projected_win_perc'), decimals = 3)

team	w	l	games_remaining	ros_projected_win_perc	projected_wins
Tigers	51	31	80	0.588	98
Guardians	40	39	83	0.627	92
Twins	39	42	81	0.556	84
Royals	38	43	81	0.407	71
White Sox	26	55	81	0.494	66

In this example, the Tigers play .588 in their remaining 80 games, finishing with 98 wins, 6 up on the second-place Guardians, who do really well in this sim. This seems plausible, but we won’t know how likely it is. It’s a good thing we did this 10,000 times so we can average our results.

Show the code

al_central_simmed %>%
  group_by(team) %>%
  mutate(won_division = ifelse(ranking == 1, 1, 0)) %>%
  summarise(avg_projected_wins = mean(projected_wins),
            projected_10th = quantile(projected_wins, .10),
            projected_90th = quantile(projected_wins, .90),
            won_division = mean(won_division)) %>%
  arrange(desc(avg_projected_wins)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c('avg_projected_wins'), decimals = 1) %>%
  fmt_percent(columns = c('won_division'), decimals = 1)

team	avg_projected_wins	projected_10th	projected_90th	won_division
Tigers	95.8	90	101	98.7%
Guardians	81.8	76	88	1.7%
Twins	78.8	73	84	0.3%
Royals	77.2	71	83	0.1%
White Sox	59.3	54	65	0.0%

This model has the Tigers winning an average of 96 games, winning the Central in 98.7% of sims. Their 10th percentile projected wins is 90, which exceeds the 90th percentile for every other team. So, it would require a combination of some other team playing red-hot ball and the Tigers playing poorly for them to not win the division. For instance:

Show the code

al_central_simmed %>%
  group_by(sim_id) %>%
  mutate(tigers_won = max(ifelse(ranking == 1 & team == 'Tigers', 1, 0))) %>%
  filter(tigers_won == 0) %>%
  ungroup() %>%
  filter(sim_id == min(sim_id)) %>%
  select(team, w, l, games_remaining, ros_projected_win_perc, projected_wins) %>%
  arrange(desc(projected_wins)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(columns = c('ros_projected_win_perc'), decimals = 3)

team	w	l	games_remaining	ros_projected_win_perc	projected_wins
Twins	39	42	81	0.679	94
Tigers	51	31	80	0.512	92
Guardians	40	39	83	0.530	84
Royals	38	43	81	0.407	71
White Sox	26	55	81	0.420	60

In this sim, the Twins play .679 ball the rest of the year while the Tigers fall to .512. Could it happen? Sure. But the Twins only won in 0.3% of sims. This doesn’t mean it’s impossible - the model still says there’s a chance - but it’s also pretty unlikely. But seeing unlikely things is part of the appeal of baseball. Yet the Tigers winning the Central is far and away the most likely outcome, which means it’s going to be a fun fall.