Ellsbury as Example
Jacoby Ellsbury just signed with the Yankees. As part of the analysis of his contract, there was much consternation from the internet concerning two factors: his age and his injury history. No one doubts that Ellsbury could be a productive player over the length of his contract, but there is a significant concern that due to his position, skills (speed and defense), age, and injury history, he will fail to meet his projections and end up a massive overpay.
Because I had Jeff Zimmerman’s injury data on hand from my recent post on aging and injury probability, I decided to take up the question of how much a player’s prior history predicts his future probability of injury. In Ellsbury’s case, there is a further complicating factor: his prior injuries occurred in non-standard (“freak”) ways, primarily via collision with other players. Because these prior injuries didn’t occur in the typical course of playing baseball, the thought is that they will be less predictive for his future injury probability (or so the argument goes). I can’t speak directly to the question of injury weirdness and how it affects recurrence probability–sadly, there’s no “freak” variable in Jeff Zimmerman’s dataset–but I will look at how different types of injury can be more or less predictive of future injury.
Like last time, I analyze injury risk by using a set of logistic regression models which consider whether or not a player spent time on the DL in a given year as a response variable, and various factors such as age and injury history as predictors. For starters, I know from my last post that injury probability is strongly influenced by age to the tune of about a 2% higher injury risk per year. I incorporate injury history by asking whether a player’s injury status in 2011 is predictive for his injury status in 2012.
Lo and behold, it is, as symbolized by the fact that the model’s accuracy increases substantially after the inclusion of injury history (for those worried about overfitting, this is true even after model selection using AIC). In fact, injury history is more predictive than age in the combined model. It is worth noting that while age and injury history have significant effects, they explain relatively little of the variation in injury occurrence–I suspect because there is a large stochastic (luck-based) component. However, the point remains: past injury strongly predicts future injury.
As I mentioned above, the significant mitigating factor in Ellsbury’s particular history is that his mishaps looked like freak accidents, not routine plays. I can’t directly parse how the “freakness” of an injury predicts future injuries, but as a proxy, I can look at whether injury type is useful for predicting the future probability of injury. I consider here two ‘type’ variables from the data: injury location and injury description. Injury location is quite simply what part of the body was injured; as you might expect, this has categories such as ribs, legs, foot, shoulder, abdominal, etc. Injury description is a more nebulous category, but it basically explains what sort of injury occurred: strain, break, bruise and so on (also including sleep disorder, oddly enough).
Rerunning the model whilst considering the location of the injury resulted in a slightly worse model fit (not pictured). Meanwhile, considering the injury description, the model became only slightly more accurate, which is represented by the third bar in the above graph (not more accurate than you would expect given the additional variable). For both of these injury types, then, I can say that I see no evidence that they predict future injury probability.
Caveats and Summary
Caveats are in order here. While I am fairly confident that past injury predicts future injury, the converse conclusion, in the case of injury type, cannot be made. Absence of evidence is not the same as evidence of absence, and for that reason I am unwilling to definitively state that injury type is inconsequential, only that it is not so consequential as to be obvious in the data. Indeed, looking more carefully into injury description, one finds some nearly significant patterns: surgeries and tendinitis increase future injury risk quite a bit, whereas sprains, spasms, and infections seem not to increase injury risk at all. Medically (but I am not a doctor), these patterns of causation make sense to me. Surgeries are invasive and can result in complications, while infections are easily curable and unimportant after they’re gone. So a more cautious conclusion might be that we have insufficient power at present to determine whether injury type influences future injury risk.
For this reason, I can’t directly predict whether Ellsbury’s past injuries make it more or less likely that he will be injured in the future. Personally, however, I am skeptical of the freak injury hypothesis (namely, that nonstandard ways of getting injured shouldn’t increase future injury risk). Having watched no small sample of baseball in the past, I have seen players regularly perform stunning aerobatic feats; the resulting falls would break, I am quite sure, most of the bones in my body.
Ballplayers are tough. They have to be, because they are constantly running and jumping around at 10-20 mph, colliding with walls and the ground and each other. Most of the time, despite that, they remain uninjured. I wonder whether things we consider as “freak injuries” aren’t really just the metaphorical last straw of, say, a tibia stretched to its absolute, physiological limits. I wonder whether there isn’t a strong observation bias, whereby we remember all the ballplayers who somehow came back from these injuries and forget all the guys whose now-weakened bones gave in at some later time in more spectacular fashion.
As before, definite conclusions are hard to come by in the case of sabermetrics. What is clear is that Ellsbury is probably a risky acquisition, if not due to his injury history then due simply to his age and the position he plays. Whether it pays off for the Yankees is now in the hands of the luck dragons.