Team wRC+ Balance – Point Estimates

We’re almost 60 games into the MLB season. A lot can happen in 60 games that may not happen in the next 100, but it still gives us a fairly decent sample. One of the fun* questions to ask is “how balanced is a team’s lineup?”.

*You may have a more interesting definition of fun than I do.

To answer this, I downloaded batting leader data from Fangraphs, then aggregated by team. I chose wRC+ as my metric of interest, since this adjusts for ballpark and league. Since we want to know how balanced a team is, we’ll take their overall wRC+, as well as the weighted standard deviation of wRC+. This means, for instance, that we can still include Wenceel Pérez and his frankly crazy 257 wRC+ (i.e. he’s 157% better than league average) for the Tigers, but since he only has 11 PA, it won’t count for much in either the team total nor the standard deviation.

Speaking of standard deviation - a SD of 10 is very different for a team with a mean of 110 than it is for a team of 90. To account for this, we’ll use the coefficient of variation (COV), which is the standard deviation divided by the mean*.

*I hope you were sitting down when you read this, because this is exciting stuff.

A wRC+ of 100 means exactly league average, considering both on-base percentage and power (using linear weights). A low COV means teams tend to cluster around the mean, while high means they’re spread out. This measure performance, not talent - so if a player has been lucky for 60 days and is due for a bounceback, that won’t show here. Without further ado, a chart:

Show the code

library(gt)
library(gtExtras)
library(Hmisc)
library(hrbrthemes)
library(janitor)
library(mlbplotR)
library(tidyverse)

fg_stats <- read_csv('fangraphs-leaderboards.csv') %>%
  clean_names()

team_stats <- fg_stats %>%
  group_by(team) %>%
  summarise(r = sum(r),
            avg_wrc_plus = weighted.mean(w_rc, pa),
            sd_wrc_plus = sqrt(wtd.var(w_rc, pa))) %>%
  mutate(cov_wrc_plus = sd_wrc_plus / avg_wrc_plus) %>%
  mutate(team = clean_team_abbrs(team)) %>%
  mutate(team = ifelse(team == 'ATH', 'OAK', team))

team_stats %>%
  ggplot(aes(x = avg_wrc_plus, y = cov_wrc_plus))+
  geom_mlb_logos(aes(team_abbr = team), width = 0.055, alpha = 0.7)+
  theme_ipsum()+
  xlab('team_wrc+')+
  ylab('COV')

A few thing jump out here: first, the Rockies are just horrible. A team wRC+ of 64 is abysmal, and while it’s highly variable, only one player with significant playing time, Jordan Beck, has a wRC+ over 100. Catcher Jacob Stallings has a wRC+ of 9 - 9! - in 88 PA, with a .152/.230/.190 line.

Next, the Dodgers and Yankees are clearly having good years, and while their variation is higher, that’s easily explainable. Aaron Judge, who may very well be a robot, has a wRC+ of 240. That dwarfs Paul Goldschmidt’s 156, which is an excellent number any player would be happy to have. The Dodgers have Shohei Ohtani (187) and Freddie Freeman (192), with Will Smith trailing with “only” 177. Max Muncy is at 96, which is a little disappointing given his past performance, although his xwOBA of .339 suggests better days ahead.

Finally, my two favorite teams. The Tigers have a wRC+ of 110, but it’s one of the more balanced lineups out there. Justyn-Henry Malloy (94) and Trey Sweeney (75) are the only players with over 100 PA with a wRC+ under 100. The Cubs, meanwhile, have no player with over 100 PA with a WRC+ under 100, making their lineup a tough one to contend with.

Of course, we really care about runs in the end. We can run a very quick linear model to see if team total wRC+ or consistency is more important.

Show the code

lm_runs <- lm(r ~ avg_wrc_plus + cov_wrc_plus, data = team_stats)
summary(lm_runs)


Call:
lm(formula = r ~ avg_wrc_plus + cov_wrc_plus, data = team_stats)

Residuals:
    Min      1Q  Median      3Q     Max 
-27.570 -13.542  -4.804  11.701  38.552 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -35.1722    50.5960  -0.695    0.493    
avg_wrc_plus   2.6394     0.3415   7.729  2.6e-08 ***
cov_wrc_plus  47.4837    54.5826   0.870    0.392    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.62 on 27 degrees of freedom
Multiple R-squared:  0.7846,    Adjusted R-squared:  0.7687 
F-statistic: 49.18 on 2 and 27 DF,  p-value: 9.961e-10

So that total is carries more weight than the distribution. Team wRC+ has a significant relationship, while the COV is considered not staistically signficant. This is a very simple model, though, so it may be missing on some factors. Speed and baserunning, for instance, would be pretty important factors that we’re not considering here.

Still, it’s not totally without its uses. It helps us see which teams are more reliant on certain players to carry the offense and which may be more resilient to injuries.

Show the code

team_stats %>%
  arrange(desc(avg_wrc_plus)) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_number(c(avg_wrc_plus), decimals = 0) %>%
  fmt_number(c(sd_wrc_plus, cov_wrc_plus), decimals = 2)

team	r	avg_wrc_plus	sd_wrc_plus	cov_wrc_plus
NYY	310	129	50.00	0.39
LAD	322	125	45.34	0.36
CHC	332	120	31.39	0.26
AZ	286	114	35.91	0.31
SEA	256	112	39.72	0.35
DET	295	110	37.99	0.34
NYM	247	109	28.48	0.26
OAK	242	108	30.13	0.28
PHI	272	107	29.30	0.27
STL	270	106	29.77	0.28
HOU	235	105	23.50	0.22
BOS	280	105	41.89	0.40
TOR	232	104	29.47	0.28
SD	236	101	43.36	0.43
TB	240	100	38.91	0.39
WSH	253	98	34.64	0.35
ATL	230	98	38.21	0.39
BAL	215	97	43.79	0.45
CIN	270	97	38.27	0.39
MIN	228	96	42.48	0.44
SF	244	94	32.95	0.35
CLE	224	94	42.95	0.46
MIA	229	94	30.54	0.33
LAA	228	92	36.47	0.40
MIL	252	88	35.99	0.41
KC	193	83	39.09	0.47
TEX	196	82	35.77	0.44
CWS	197	81	38.55	0.48
PIT	185	78	32.64	0.42
COL	179	64	42.02	0.66