The 2025 MLB regular season has ended. Alas, the Tigers went from 15.5 games up to scratching out a wild card win, with a hugely disappointing second half. The final standings show them only 1 game behind, but that doesn’t capture the season super well. We’ll take a quick look at the season through several different lenses to check the stories that emerge.
First, let’s review the end standings. The only thing I added here that you won’t typically see is the barcode plot showing wins. I considered using the typical run differential charts, but I like seeing a simple blue bar for a win. This helps large winning steaks stand out, as well as losing streaks, alas for Detroit. We’ll get the data from baseball-reference*, using the baseballr package.
*I usually check Fangraphs for standings, but BR is easier to scrape. The playoff odds charts on FG are fun to watch..
We see some pretty good stories there - teams like the Royals, Twins, and As who were hot early but cooled off, the Guardians and Mariners hot finishes, and the Brewer’s crazy mid-season win streak. Out of kindness, we’ll refrain from commenting on the Rockies. While I like this view, it’s also difficult to take in so much disaggregated data when looking at the season. Let’s split it in half and see which teams saw the biggest changes from the first 81 games to the last 81. The start of the line is the first half win percentage, the arrow is the second half, and the dot is the season total.
It’s mildly comforting to know that the Rays had a larger second-half collapse than the Tigers, though it doesn’t really help anything.
*The better option here would have been to create a function for the chart we just made. That’s two separate charts put together in Patchwork, there’s enough code behind it that it’s easier at this point to just make a table in GT. We could also have created a scatterplot showing the differences in runs scored and allowed, though we’d then lose the win percent data. Charting is all about tradeoffs.
We can also look at how runs per game and runs allowed per game have changed. We’ll switch back to a table for this, there are different ways we could chart it but a table should suffice here. We’ll sort by change in win percentage from first half to second.
Nothing shocking here, really - if you score less runs and give up more, you win fewer games. The Mets offense actually improved the second half, but they gave up more runs. The Astros offense stayed flat, but their pitching degraded, while the Dodgers saw less offense but better pitching. The As saw large improvements in both run scoring and prevention, and while 4.25 runs per game was fairly pedestrian the improvement was enough to get them to a .543 win percent.
We can go beyond halves as well. I forgot where I first saw the idea of splitting the season into 9 18-game innings, but it’s a nice way to break the season up and show change over time.
This tells a more detailed story - we see the Tigers had two brutal stretches while the Guardians were on fire at the end of the year. The Cubs were the picture of consistency, but the Brewers had an incredible run mid-season. Out of kindness, we won’t comment on the Rockies.
We can do better than just eyeballing the data. We’ll take the average and standard deviation of win percent for all 9 of our 18-game stretches and plot them against each other.
Show the code
season_innings %>%group_by(Tm, lg_div) %>%summarise(mean_win_percent =mean(win_perc),sd_win_percent =sd(win_perc)) %>%ggplot(aes(x = mean_win_percent, y = sd_win_percent))+geom_mlb_logos(aes(team_abbr = Tm), width =0.075)+theme_ipsum()+xlab('Average Win Percent per 18-game stretch')+ylab('Standard Deviation of Win Percent per 18-game stretch')
The Cubs do indeed come in with the lowest standard deviation, along with a high win percentage. It feels like the Red Sox have had more of a roller-coaster season, so seeing them come in with a low SD was somewhat surprising. The Tigers and Guardians each have high SDs, the former had a strong first half and a disastrous second half while the latter had the inverse. So, like to many things in the world of data, summary numbers are helpful but don’t show the whole story.
And that’s really the point of this exercise. If all we cared about were the final results, then a look at the standings would suffice. But we want more than that, we want an idea of what the season looked like. Aggregating at lower grains gives us a more clear picture, but also requires more work to digest. When presenting your data, being able to zoom in and out in a way that keeps your audience informed without overwhelming them and giving them enough detail is a tension that will always exist, and will vary based on who your stakeholders are.
It’s been a fun season, though I’d rather the Tigers second half looked better. Things can change rapidly in the postseason, since the best teams can lose to the worst in a 5-game series. Let’s hope for some chaos this October.