MLB’s allowed teams to challenge calls for several years now. Thanks to Baseball Savant, we can see info on every call made as well. Scraping the data isn’t particularly difficult, so let’s look at 2025 and see what teams performed best. We’ll ignore replays initiated by umpires, as well as the All-Star game and all postseason games. This leaves us with 1,154 total replays.
The Astros, Brewers, and Phillies all did well with their replays, each above or around 70%. The Rangers, Rays, and Orioles were all pretty bad, each under 43% (with my Tigers only at 44%). Only six teams had a losing record on their replays.
There are different types of calls, let’s take a look at those.
Tag play being basically breakeven checks out, as does close play at 1st being generally overturned. A bit surprised to see hit by pitch up so high, though 90 over a season isn’t a lot.
Here’s another intuitive finding. Since teams have one replay per game and will lose it if their challenge isn’t upheld, they’re only likely to challenge plays in early innings where the stakes aren’t as high if they feel they’re likely to retain their challenge. As the game goes on, calls are less likely to be overturned, most likely because teams are more willing to risk losing a challenge if it helps them win. It’s also possible that the replay booth becomes more conservative about overturning calls, since that could affect the outcome of the game.
Having the general lay of the land, we can now build a simple model to see which teams got the most out of the replays they had. The model will consider the replay type and the inning. We’ll then use the model to predict a team’s predicted overturned rate and compare to their actual.
Our model - which we should caveat is very simple - predicted the Diamandbacks to only overturn 53% of their challenges. They instead got 66%, a 13pt increase. That’s about 6.5 calls, which depending on the game state could be very impactful. (Arizona finished under .500, so it didn’t help them that much in the end.) The Phillies had a better rate, but Arizona had more challenges so they had a higher calls over expected tally. That said, given the sample sizes, it’s hard to read too much into this. Let’s also note that the Dodgers are middle of the pack, both in raw and predicted terms, yet are on their way to another World Series win.
*Having additional information, such as score, which umpire made the call, if the team was objecting for offense or defense, etc. would add a lot of value.
It’s always helpful to investigate the data a little bit more, so let’s look at the Tigers.
The Tigers did well on close plays at first, but were pretty bad at tag plays. The close play at first success is the only thing keeping them from being completely abysmal.
Wrapping up, there are a few things we can note about this. Firstly, having a more robust dataset would help us better identify teams that do well with replay. We can look at Statcast to watch video of every challenge, which would be worthwhile if we really want to dig deep. F
inally, we note that this dataset only includes calls that were challenged, and not calls that should have been challenged. For instance, a runner may have been out at second, but was called safe and the defensive team didn’t challenge. This should be considered a debit on their replay ability, but since we don’t have a way to track these situations we lose sight of this.
The biggest thing to add would be win probability change. Getting a successful challenge during a game that’s 2-2 in the 8th is very different than getting the same challenge when you’re up 10-0 in the 8th. Might make for a fun phase 2 of this at some point.