Age, Position, and Injury Risk in Baseball

Sabermetrics Takes on Aging

An ongoing debate in sabermetrics concerns the way that baseball players age.  Long-treasured wisdom and lore within baseball holds that players at certain positions, namely the most athletically-demanding ones, decline more rapidly than other players.  Therefore it is widely believed among old-school players and coaches that even talented 2nd basemen (2Bs) become ineffective in their early 30s, while first basemen (1Bs) can go on indefinitely; a stereotype exists for each position, in fact.  While there has been some evidence that this lore is correct (see Nate Silver’s study), it also represents exactly the type of myth which sabermetricians love to obliterate; as a result, there is also a significant backlash (see here), which claims that players of all positions age at exactly the same clocklike rate. The question of how baseball players age is undoubtedly complex, and too great a mystery for one blog post to solve.  There are at least plausible contributions from age, body type, and skills; what’s more, players often shift roles as they age, for instance from catcher to first basemen, or regular to bench player.  All of these factors and the problem of small sample size contribute to the mystery.

I’ll take a look at how players age through one particular lens: injury data.  In baseball, when players are sufficiently injured so as to no longer be able to perform effectively, they are sent to the “Disabled List” (DL), usually for rehabilitation.  (Throughout the post, I’ll be using “injured” as a proxy to refer to players who took a trip, however short, to the Disabled List).  Thanks to the efforts of various excellent people, trips to the DL have been catalogued over the last few years and are stored here.  Using this data (the 2012 data, in particular) as my starting point, I’ll ask how two factors, namely age and position, affect the probability of injury.  While many players maintain some effectiveness even in years in which they are injured, injuries are undoubtedly a crucial aspect of player performance–for the straightforward reason that players can’t accrue positive value when they are on the DL and thus not playing.

Age and Injury Risk

It’s an obvious (and depressing) certainty of human physiology that injury risk increases with age.  This fact is accentuated for professional athletes, so we can expect that age plays a significant causal role in injury risk.  And it does: injuryrisk_byage   This pair of boxplots represents the distribution of ages for those players in 2012 who didn’t get injured (blue), and those who did (red).  As you can see, injured players tended to be older, by about 2 years (30 vs. 28, rounding).  The black dotted line represents the fit of a logistic regression model, which is a fancy way of understanding how the risk of injury (axis on the right side) increases with age.  We see that injury risk increases approximately linearly through the whole age range, with a slope of about .02 per year; which is to say, every year a player ages, they are 2% more likely to make a trip to the DL.  The risk of injury is quite substantial in a player’s late 30s–almost 50%.  The graph ought to convince you, but if it doesn’t, the logistic regression model is absurdly certain that age is strongly correlated with probability of injury (p < 10-10).

Position and Injury Risk

To the question which prompted the post, however, what’s the probability of injury as a function of position? injuryrisk_bypos There’s some intuitive patterns here: shortstops (SS) and center fielders (CF) are near the top, whereas left and right fielders (corner outfielders [cOF]) are towards the bottom.  But there’s also some oddness; why are 3Bs the most injured players? The problem with this graph is that we didn’t adjust for age, which as we’ve already seen, is a significant covariate.  For example, if pitchers are on average younger (which they are), they might not get injured as often in an absolute sense, yet they may get injured more often than expected, given their age–precisely what we’d like to know.  Reframing the question thusly, we arrive at the following graph, charting age-conditioned risk of injury as a function of some positions (check out all positions here): injuryrisk_byboth This graph comports more with the conventional wisdom: SS, 2B, and P all rise in injury risk rapidly, while corner outfielders (cOF) are more or less flat over time (corner outfield being a not very demanding position).  There’s also some weird patterns though–why does 1B increase much more rapidly than even the SS/2B/P combination?  Why does catcher (C) injury risk decline over time? Part of the problem, I suspect, is that players’ positions are not static over time.  Often, players who are injured at catcher are switched to first base (Joe Mauer being a recent example); which movement can be viewed as exporting aged catcher’s injury risk into the 1B category.  This “position effect” could result in all the older catchers who stick around at C being hardened, un-injurable veterans; if they weren’t, they would have switched positions.

Position effect is a subset of a larger class of effect which I’ll term survivor bias.  Consider shortstops: notice that there’s missing data already at age 32 or so–simply because few shortstops play when they are that old!  What’s more, the SS who remain playing into their mid-30s are unlikely to represent a random sample of all shortstops.  To say it differently, older players have very different characteristics from younger players, because younger players who didn’t have those characteristics were removed from the sample.  Statistically, this type of bias, called “Missing Not at Random,” is among the most pernicious and troublesome issues.

Conclusion(s)

So I am twice-confounded: once by age, and then again by survivor bias (and perhaps thirdly, by not having enough data).  Such is the complexity of baseball, where almost all the variables are correlated with all the others, and strategic considerations can make for unpredictable trends like catchers aging more gracefully than expected.  Still, I think there is evidence for some patterns: the athletically demanding middle infield positions (SS/2B) do show heightened aging, whereas corner outfielders have a pronounced flatness to their line.  First basemen start out being fairly hardy, but I suspect that as they take on rejects from other positions, the aggregate risk of injury increases substantially.  In the future, I’ll look to incorporate survivor bias and the resulting position-shifts into this analysis, hopefully nailing 1B down. But perhaps most obviously from the post: injury risk increases strongly with age.  The 2014 Philadelphia Phillies, who seem hell-bent on becoming the oldest team in MLB history, are probably screwed.

_______________________________

EDIT: since the Yankees went on their annual spending spree, they now threaten to be the oldest team in MLB history, leaving the Phillies in the merely “very old” category.  See this recent article.

7 comments

  1. Darius A

    Nice article – have been thinking about this myself for a little while. Would be interesting to see the stats over a number of seasons. Did you just use the positional designations as listed on the baseballheatmaps spreadsheets? How did you account for those who have multiple positions? I wonder about how difficult this is to pin down for a lot of players; the likes of Ben Zobrist, for example, who have played a lot of career games at both infield and outfield spots and therefore would have less demands placed on their body than someone who plays exclusively middle infield for 150+ games a year, but more than pure corner outfielders.

    • rk

      For positional designations, I wasn’t sure how the baseballheatmaps guys decided on them, so I calculated them myself based on what position a player occupied the most (according to fielding data). I neglected to look specifically at utility types such as Zobrist who may have spent significant amounts of time at multiple positions, although it would be interesting to consider whether these players’ injury risk is based on a mix of the positions they played–e.g. for some utility player, if he spent 50 games at SS and 50 in LF, does he get injured approximately halfway between the LFs and the SSs? Or, is it possible that switching positions itself makes a player more injury prone, since they never get used to playing a single spot? Have to think about that.

      Generally though, eyeing the data, Zobrist is an unusual case. More than 75% of the players spent >80% of their games at a single position. Still worth consideration, perhaps in a future post.

      • Darius A

        Yeah, you’re right that Zobrist is a rather extreme case – it’s pretty rare for full-time players to be used like he is. I would guess that corner outfield/first base, or even catcher/first base is a more common combination than Zobrist’s collection of positions.

        And thanks! Saw the article on Jeff’s MASH Report at Fangraphs and thought it sounded interesting. I’m hoping to do some analysis of DL data myself at some point this offseason so I’m always looking for other people doing good research into the area.

      • Darius A

        Yeah, you’re right that Zobrist is a rather extreme case – it’s pretty rare for full-time players to be used like he is. I would guess that corner outfield/first base, or even catcher/first base is a more common combination than Zobrist’s collection of positions.

        And you’re welcome! Saw the article on Jeff’s MASH Report at Fangraphs and thought it sounded interesting. I’m hoping to do some analysis of DL data myself at some point this offseason so I’m always looking for other people doing good research into the area.

  2. Pingback: Predicting Injury Status | No Little Plans
  3. Pingback: The 2014 Season’s Unsung Heroes

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>