MLB Stint Duration Over Time

statistics
r
baseball
Author

Mark Jurries II

Published

December 7, 2022

It’s currently the offseason for MLB, which means a lot of players changing teams, either through free agency or trade. It sometimes feels like players spend less time with their teams then they used to, and thankfully the data to check this is readily available.

We’ll be measuring player “stints” - that is, consecutive years with one team. As an example, we’ll look at Curtis Granderson:

*More precisely, I’m not interested. Sorry!
Show the code
library(baseballr)
library(hrbrthemes)
library(gt)
library(Lahman)
library(plotly)
library(tidymodels)
library(tidyverse)
library(tidyquant)
library(zoo)

teams <- Lahman::Teams %>%
  as_tibble()

batting <- Lahman::Batting %>% 
  as_tibble %>%
  left_join(teams %>% select(yearID, teamID, franchID)) %>%
  rowwise() %>%
  mutate(PA = sum(AB, BB, HBP, SH, SF, na.rm = TRUE)) %>%
  select(playerID, yearID, stint, franchID, G, PA) %>%
  rename(bg = G)

pitching <- Lahman::Pitching %>% 
  as_tibble %>%
  left_join(teams %>% select(yearID, teamID, franchID)) %>%
  select(playerID, yearID, stint, franchID, G, IPouts) %>%
  rename(pg = G)

players_all <- batting %>%
  union_all(pitching) %>%
  group_by(playerID, yearID, stint, franchID) %>%
  summarise_all(~ sum(., na.rm = TRUE)) %>%
  mutate(type = case_when(PA > IPouts ~ 'Batter', TRUE ~ 'Pitcher')) %>%
  mutate(G = case_when(type == 'Batter' ~ bg, TRUE ~ pg),
         wgt = case_when(type == 'Batter' ~ PA, TRUE ~ IPouts))

player_info <- Lahman::People %>%
  as_tibble() %>%
  select(playerID, nameFirst, nameLast) %>%
  unite(name, nameFirst, nameLast, sep = " ")

player_stint_data <- players_all %>%
  #filter(playerID == 'grandcu01') %>%
  group_by(playerID) %>%
  mutate(prior_franchID = lag(franchID, n = 1, order_by = playerID),
         prior_yearID = lag(yearID, n = 1, order_by = playerID),
         is_same_team = (franchID == prior_franchID) & (yearID == prior_yearID+1),
         stint_start = case_when(is.na(is_same_team) ~ 0, 
                                   is_same_team == FALSE ~ 0,
                                   TRUE ~ 1),
         stint_start2 = replace(stint_start, stint_start == 0, NA)
         ) %>%
  mutate(player_stint_id = cumsum(is.na(stint_start2))) %>% 
  group_by(playerID, player_stint_id) %>%
  mutate(stint_seasons_played = cumsum(stint_start) + 1) %>% 
  select(-prior_franchID, -prior_yearID, -is_same_team, -stint_start, -stint_start2) %>%
  mutate(stint_seaons_left = max(stint_seasons_played) - (stint_seasons_played - 1)) %>%
  left_join(player_info)

#to be used lated for player info
player_stint_summarised <- player_stint_data %>%
  group_by(playerID, name, franchID, player_stint_id, type) %>%
  summarise(start_year = min(yearID),
         end_year = max(yearID),
         length = max(stint_seasons_played)) %>%
  arrange(desc(length))

#to be used later for team info
active_frachises <- teams %>%
  filter(yearID == 2021) %>%
  mutate(lgdivID = paste(lgID, divID, teamID)) %>%
  select(franchID, lgID, teamID, divID, lgdivID) %>%
  arrange(lgdivID)
Show the code
player_stint_data %>%
  filter(playerID == 'grandcu01') %>%
  ungroup() %>%
  select(player_stint_id, name, yearID, franchID, stint_seasons_played, PA) %>%
  gt() %>%
  tab_style(
    locations = cells_column_labels(columns = everything()),
    style = list(
      cell_text(weight = "bold")
      )
    )
player_stint_id name yearID franchID stint_seasons_played PA
1 Curtis Granderson 2004 DET 1 28
1 Curtis Granderson 2005 DET 2 174
1 Curtis Granderson 2006 DET 3 679
1 Curtis Granderson 2007 DET 4 676
1 Curtis Granderson 2008 DET 5 629
1 Curtis Granderson 2009 DET 6 710
2 Curtis Granderson 2010 NYY 1 528
2 Curtis Granderson 2011 NYY 2 691
2 Curtis Granderson 2012 NYY 3 684
2 Curtis Granderson 2013 NYY 4 245
3 Curtis Granderson 2014 NYM 1 654
3 Curtis Granderson 2015 NYM 2 682
3 Curtis Granderson 2016 NYM 3 633
3 Curtis Granderson 2017 NYM 4 395
4 Curtis Granderson 2017 LAD 1 132
5 Curtis Granderson 2018 TOR 1 349
6 Curtis Granderson 2018 MIL 1 54
7 Curtis Granderson 2019 FLA 1 363

Grandy had 7 stints in total - 6 years with the Tigers, 4 with the Yankees, 4 with the Mets, and 1 year each with the Dodgers, Blue Jays, Brewers, and Marlins. He split time between teams in two seasons, so for both league and team summaries we’ll weigh the average stint duration by plate appearances for hitters and innings pitched for pitchers.

Show the code
league_plot <- player_stint_data %>%
  group_by(yearID, type) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)
  ) %>%
  ggplot(aes(x = yearID, y = w_mean_stint_seaons_played, color = type, 
             text = paste("<br>Year:", yearID,
                          "<br>Player Type:", type,
                          "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_line(group = 1)+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Season')+
  scale_color_manual(values = c('#002D72', '#D50032'))

ggplotly(league_plot, tooltip = 'text')

Some interesting stories emerge here. Firstly, it’s assuring to see the low numbers in the 1880s - if those had been higher my formulas would be broken, so this passes the sniff test. Secondly, we see the effect of WWII in the 1940s, when many players went overseas to fight. We see a peak in the 60s, peaking in 1968 when Mickey Mantle’s 18-year run with the Yankees came to an end, along with Roy Face’s 14 years with the Pirates and Vada Pinson’s 11 with the Reds.

Since the 90s, batters have tended to stay with a team about 3.3 years or so, while pitchers are around 2.8. However; the gap has grown some as teams have turned more to the bullpen, where pitchers tend to have shorter tenures.

The story is even more interesting if we break it down by team. We’ll just look at the 30 active teams just to keep things clean.

Show the code
team_plot <- player_stint_data %>%
  inner_join(active_frachises) %>%
  group_by(yearID, type, lgdivID, franchID, lgID, divID) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)
  ) %>%
  ggplot(aes(x = yearID, y = w_mean_stint_seaons_played, color = type,
             text = paste("Team:", franchID,
                          "<br>Current League/Division", lgID, divID,
                          "<br>Year:", yearID,
                          "<br>Player Type", type,
                          "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_line(group = 1)+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Season')+
  scale_color_manual(values = c('#002D72', '#D50032'))+
  facet_wrap(lgdivID ~ ., ncol = 5)+
  theme(legend.position = "none")

ggplotly(team_plot, tooltip = 'text')

The 70s Tigers had some guys stick around - Al Kaline ended a 22 year career, while Norm Cash, Mickey Stanley and Willie Horton each finished 15 years. We see several Yankee dynasties, while Cincinnati’s Big Red Machine jumps out in the 70s.

For kicks and giggles, I took a look at a team’s winning percentage against average stint duration.

Show the code
against_win_perc <- player_stint_data %>%
  group_by(yearID, franchID) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)) %>%
  filter(yearID >= 1920) %>%
  inner_join(teams %>%
               mutate(win_perc = W / (W + L)) %>%
               select(yearID, franchID, win_perc, LgWin, WSWin))

team_corr <- against_win_perc %>%
  ggplot(aes(x = win_perc, y = w_mean_stint_seaons_played))+
  geom_point(alpha = 0.3, 
             color = '#40bf40',
             aes(text = paste("Team:", franchID,
                              "<br>Year:", yearID,
                              "<br>Win Percentage:", round(win_perc, 3),
                              "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_smooth(method = 'lm', se = FALSE, color = 'grey', linetype = 'dashed')+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Win Percentage')

ggplotly(team_corr, tooltip = 'text', height = 650, width = 650)