Sabermetrics Takes on Aging
An ongoing debate in sabermetrics concerns the way that baseball players age. Long-treasured wisdom and lore within baseball holds that players at certain positions, namely the most athletically-demanding ones, decline more rapidly than other players. Therefore it is widely believed among old-school players and coaches that even talented 2nd basemen (2Bs) become ineffective in their early 30s, while first basemen (1Bs) can go on indefinitely; a stereotype exists for each position, in fact. While there has been some evidence that this lore is correct (see Nate Silver’s study), it also represents exactly the type of myth which sabermetricians love to obliterate; as a result, there is also a significant backlash (see here), which claims that players of all positions age at exactly the same clocklike rate. The question of how baseball players age is undoubtedly complex, and too great a mystery for one blog post to solve. There are at least plausible contributions from age, body type, and skills; what’s more, players often shift roles as they age, for instance from catcher to first basemen, or regular to bench player. All of these factors and the problem of small sample size contribute to the mystery.
I’ll take a look at how players age through one particular lens: injury data. In baseball, when players are sufficiently injured so as to no longer be able to perform effectively, they are sent to the “Disabled List” (DL), usually for rehabilitation. (Throughout the post, I’ll be using “injured” as a proxy to refer to players who took a trip, however short, to the Disabled List). Thanks to the efforts of various excellent people, trips to the DL have been catalogued over the last few years and are stored here. Using this data (the 2012 data, in particular) as my starting point, I’ll ask how two factors, namely age and position, affect the probability of injury. While many players maintain some effectiveness even in years in which they are injured, injuries are undoubtedly a crucial aspect of player performance–for the straightforward reason that players can’t accrue positive value when they are on the DL and thus not playing.
Age and Injury Risk
It’s an obvious (and depressing) certainty of human physiology that injury risk increases with age. This fact is accentuated for professional athletes, so we can expect that age plays a significant causal role in injury risk. And it does: This pair of boxplots represents the distribution of ages for those players in 2012 who didn’t get injured (blue), and those who did (red). As you can see, injured players tended to be older, by about 2 years (30 vs. 28, rounding). The black dotted line represents the fit of a logistic regression model, which is a fancy way of understanding how the risk of injury (axis on the right side) increases with age. We see that injury risk increases approximately linearly through the whole age range, with a slope of about .02 per year; which is to say, every year a player ages, they are 2% more likely to make a trip to the DL. The risk of injury is quite substantial in a player’s late 30s–almost 50%. The graph ought to convince you, but if it doesn’t, the logistic regression model is absurdly certain that age is strongly correlated with probability of injury (p < 10-10).
Position and Injury Risk
To the question which prompted the post, however, what’s the probability of injury as a function of position? There’s some intuitive patterns here: shortstops (SS) and center fielders (CF) are near the top, whereas left and right fielders (corner outfielders [cOF]) are towards the bottom. But there’s also some oddness; why are 3Bs the most injured players? The problem with this graph is that we didn’t adjust for age, which as we’ve already seen, is a significant covariate. For example, if pitchers are on average younger (which they are), they might not get injured as often in an absolute sense, yet they may get injured more often than expected, given their age–precisely what we’d like to know. Reframing the question thusly, we arrive at the following graph, charting age-conditioned risk of injury as a function of some positions (check out all positions here): This graph comports more with the conventional wisdom: SS, 2B, and P all rise in injury risk rapidly, while corner outfielders (cOF) are more or less flat over time (corner outfield being a not very demanding position). There’s also some weird patterns though–why does 1B increase much more rapidly than even the SS/2B/P combination? Why does catcher (C) injury risk decline over time? Part of the problem, I suspect, is that players’ positions are not static over time. Often, players who are injured at catcher are switched to first base (Joe Mauer being a recent example); which movement can be viewed as exporting aged catcher’s injury risk into the 1B category. This “position effect” could result in all the older catchers who stick around at C being hardened, un-injurable veterans; if they weren’t, they would have switched positions.
Position effect is a subset of a larger class of effect which I’ll term survivor bias. Consider shortstops: notice that there’s missing data already at age 32 or so–simply because few shortstops play when they are that old! What’s more, the SS who remain playing into their mid-30s are unlikely to represent a random sample of all shortstops. To say it differently, older players have very different characteristics from younger players, because younger players who didn’t have those characteristics were removed from the sample. Statistically, this type of bias, called “Missing Not at Random,” is among the most pernicious and troublesome issues.
So I am twice-confounded: once by age, and then again by survivor bias (and perhaps thirdly, by not having enough data). Such is the complexity of baseball, where almost all the variables are correlated with all the others, and strategic considerations can make for unpredictable trends like catchers aging more gracefully than expected. Still, I think there is evidence for some patterns: the athletically demanding middle infield positions (SS/2B) do show heightened aging, whereas corner outfielders have a pronounced flatness to their line. First basemen start out being fairly hardy, but I suspect that as they take on rejects from other positions, the aggregate risk of injury increases substantially. In the future, I’ll look to incorporate survivor bias and the resulting position-shifts into this analysis, hopefully nailing 1B down. But perhaps most obviously from the post: injury risk increases strongly with age. The 2014 Philadelphia Phillies, who seem hell-bent on becoming the oldest team in MLB history, are probably screwed.
EDIT: since the Yankees went on their annual spending spree, they now threaten to be the oldest team in MLB history, leaving the Phillies in the merely “very old” category. See this recent article.