Imputing Statcast’s missing data

MLB’s new Statcast system is a fantastic way to study baseball. The hybrid camera-plus-radar system tracks the location and movement of every object on the field, from the exit velocity of the ball to the position of the umpire. But Statcast is also a work in progress: as I detailed in a recent article, the system loses tracking on about 10% of all batted balls. In this post, I describe a way to ameliorate the system’s missing data problems using another source of information, the Gameday stringer coordinates.

Long before Statcast existed, there were stringers sitting in stadiums and manually recording various characteristics of batted balls, including–most importantly–the location at which each ball was fielded. As a data collection method, stringers are less accurate than a radar or camera, prone to park effects and other biases. But they have the advantage of completeness: every single batted ball is recorded by a stringer, while only 90% of batted balls are tracked by Statcast.

I had the idea of combining these two sources of information to provide more accurate and complete estimates of batted ball velocity than either system could provide alone. Each data source helps fix the weakness of the other: the stringer data is complete, but inaccurate, while Statcast is accurate, but incomplete. The stringer coordinates are recorded in the same files in which MLB provides batted ball data, making this idea exceptionally easy to execute.

I’ve come to rely on two main variables in my use of Statcast data: exit velocity, or the speed off the bat of every batted ball; and launch angle, or the vertical direction of the batted ball. I regressed the stringer coordinates against both of these variables using a Random Forest model, also including the outcome of each play and a park effect to further improve the accuracy. (For the statistically initiated, I fit the model on 20,000 batted balls and predicted the remaining ~100,000 as a form of out-of-sample validation.)

Exit Velocity

Here we’re guessing exit velocity based on the stringer coordinates and the result of the play (for example, single, double, lineout, etc.). The results are strong: the predicted values correlate with the actual numbers at r=.57. The median absolute error is only 8.4 mph, suggesting that Gameday coordinates are at least capable of distinguishing hard hit balls from soft ones. The RMSE is a bit higher (10.9), because there are some outliers with unusual exit velocities given their characteristics–for example, deflections. Manually inspecting some of these outliers convinced me that there are also some cases where the Statcast data is inaccurate. For example, there are line drive singles in the data with improbably low exit velocities (30-50 mph). In these cases, the imputed exit velocities may be more accurate than the measured ones.

Launch Angle


The imputation works even better with launch angle. (You’ll notice a kind of banding pattern for the imputed exit velocities. I believe this comes from using the recorded batted ball types (line drive, groundball, etc.) and then integrating the coordinates as a secondary factor.) The correlation between predicted and actual is even higher, at r=.9. And while the error statistics are about the same (RMSE=10.9, MAE=8.0), the range of launch angles is about three times larger, so the relative prediction error is substantially less than for exit velocity.

The results for exit velocity and launch angle suggest that we can impute both quite accurately using the Gameday stringer coordinates. To further verify that these imputed numbers are an improvement on raw Statcast, I calculated the average imputed exit velocity for each hitter and compared that to the wOBA (weighted on-base average, a rate measure of offensive production) values for the same hitters.

Unsurprisingly, the raw exit velocities correlate slightly worse with wOBA (r=.55) than the imputed exit velocities (r=.6). Interestingly, that holds true even if you focus only on the 90% of batted balls that Statcast successfully tracked (r=.55 for the imputed, r=.51 for the raw), which suggests that using the stringer coordinates acts to smooth out of some the measurement error in Statcast, even when it’s not missing data (see the example above concerning 40 mph line drives).

These are pretty encouraging results. They suggest that it’s possible to accurately impute the missing Statcast data, thus overcoming the radar’s tracking problems. Even better, doing that imputation tends to improve the underlying data’s reliability. In hindsight, it’s not surprising that fusing two sources of data would result in a more accurate set of numbers. And there are certainly other ways to improve the imputation procedure, for example taking into account plays where there has been a deflection. But for now, it’s good to know that Statcast’s missing data issue is easily solvable by integrating the Gameday stringer coordinates, which should improve a lot of downstream work that depends on Statcast.


Link roundup for 9/23/2016

No matter how complete an article feels at the time of publication, there are always a handful of interesting details that slip through the cracks or don’t fit under the word limit. On top of that, I tend to receive a ton of feedback post-publication, some of which is even worth addressing.

Twitter isn’t the ideal medium to respond or provide those additional details. So I wanted to experiment with a kind of weekly, link roundup-style blog post, summarizing the articles I’ve done in the past week and highlighting a few pieces from other authors that are worth reading as well. As mentioned, this will be a trial run for now; your comments and criticisms are welcome.

My Articles

At FiveThirtyEight, I wrote a followup to last week’s piece on optimal bullpen management with Rian Watt. We extended our metric, which measured the extent to which managers used their best relievers in the highest-leverage spots, to grade individual skippers. Better still, we established a run value for the skill, allowing us to say how many additional wins optimal bullpen management was worth.

The metric itself, which we called weighted Reliever Management+ (wRM+) is best thought of as a retrospective yardstick of a manager’s decisions. It is limited in that it does not factor in fatigue (both day-to-day and cumulative effects), matchups, or how bullpens can change over the year. Much of the criticism toward the piece focused on the fact that we didn’t account for these issues.

All of that criticism is, of course, fair. But insofar as it’s incredibly difficult to judge bullpen management in any sort of rigorous, quantitative way, I think this piece was a significant step forward.

The optimal metric would probably appraise bullpen decisions in a dynamic way, that is to say, on an inning-by-inning basis according to what the manager knows at the time of the decision. (The distinction between retrospective and dynamic measurements was suggested to me by BP writer and all-around good guy Rob Mains.)

So, for example, rather than aggregating season-level statistics as we did, you could build a system to grade every individual call to the bullpen according to which relievers were available in that game, their statistics to date in the season, their projections, the matchup (who they’d be facing), and so on. In this way, you could say whether a manager made the optimal decision based on the information he had at the time, and price in the effects of fatigue and availability.

Such a system would be exceedingly difficult to create, however. You’d need game-by-game information, and you’d have to make a lot of assumptions about when relievers were tired and how much to consider matchups. With that said, I have full faith that eventually, someone is going to make this kind of dynamic scoring system. It’s going to be awesome, and probably more accurate and insightful than wRM+ (although by how much, I do not know). In the mean time, I think of Rian and I’s metric as a step in the right direction, an approximation that works better over longer managerial careers, where factors like bullpen quality tend to even out.

Three Key Variables that Affect the Cubs’ Playoff Hopes

For The Athletic, I wrote about some of the ways October baseball is different from the regular season, and how those factors may affect the overwhelming postseason-favorite Cubs.

It’s striking how distinct playoff baseball is from the rest of the year. On top of the weather and better caliber of opponent, you have very different patterns of pitching usage. As managers get more sabermetrically savvy, I think that October is going to get even weirder and more tactically separated from the rest of the year. Ned Yost pioneered a new style of employing his relievers to more full effect in the postseason, and that increased usage will only grow more pronounced. The increase in pitching quality–both in terms of higher-caliber starting pitchers, and more bullpen action–is probably the single biggest factor which separates October from the rest of the year.

Long-term, I think that means there will be a premium on hitters who can maintain their performance against the highest-quality opposition. That is, if those hitters really exist; so far, sabermetrics hasn’t found much evidence for there being a kind of hitter who is less susceptible to the quality of the opposing pitcher. (Of course, that doesn’t mean that front offices can’t find those hitters better than public analysts.)

Other links

The Most Extraordinary Team Statistic

Looking at some of the team-level records being broken this year. More on the Cubs BABIP here:
It’s probably the biggest deviation from the league average of all time (at least for BABIP). So what is it? Defense? A new kind of positioning or shifting? Pitchers who can suppress batted ball velocity?
You’ll never guess the luckiest team in baseball this year.
3% of American adults own half of the guns in the United States. Think about that for a minute. The article is worth a full read.
From R.J. Anderson, on how the Oakland front office has failed to navigate the modern age of sabermetric equality.
A distillation of the righteous anger many feel when thinking about a Drumpf voter. I think I’m more insulated from Drumpf voters than most people; only one person on my Facebook feed ever tweets pro-Drumpf propaganda. As a result, I’m more bewildered and confused than angry.

How I got my job

I get asked one question in particular more than any other—how did you get your job at FiveThirtyEight? It’s a reasonable thing to ask. Data journalism is still so new and rapidly developing that I don’t think there’s any standard path into a position like mine. To whatever extent there is such a path, it probably runs through the same paths as traditional media jobs, either via J-school or from outlet to outlet.

That’s not the path I took. I started by getting my PhD in evolutionary genetics. I had a long-term ambition (since I was a kid) to get my PhD in something, and I felt passionate about understanding evolution in particular. I had the idea (along with many other people) to combine genomics/systems biology methods with evolutionary questions, and so went about finding an opportunity to do that.

I loved the first half of grad school. The first two years of most PhD programs are focused on learning the skills and theory within your discipline, before applying them later on to a research question. My program was an incredible intellectual environment, and I was able to test out ideas among a varied, brilliant group of students and professors.

At the same time, science can be stifling. Setting aside matters of intellectual curiosity, graduate school is also about getting you a job and launching you into (most frequently) an academic career. To that end, a great deal of it is devoted to the messy everyday business of doing science: publishing papers, applying for grants, going to conferences, making the appropriate contacts, and so on. Much of that everyday work isn’t about science at all. I know that academia isn’t unique in this. Like many careers, you have to grit your teeth and accomplish certain goals in a prescribed manner, even sometimes (for me, often) to the detriment of your broader, intellectual mission.

Around about halfway through graduate school, I became increasingly frustrated with that side of my job. I started looking around for a more creative outlet, one where I could ask interesting, data-centric questions without needing the payoff of a full-fledged academic paper to justify my efforts. I started a blog—this blog—and forced myself to do about one piece every two weeks on any topic that interested me. (That pace was calculated to be difficult and uncomfortable, but achievable.)

Perhaps the hardest thing about any regular, frequent writing assignment is finding enough material to sustain it. In search of topics for my blog, I turned to baseball, which has an abundance of available and well-curated data. I mixed a few baseball topics into my rotation, typically doing fairly simple modeling work.

I had neither expectation nor plan that this writing would lead to anything, but about a year into writing my blog, I received an email from Ben Lindbergh, who was then the Editor in Chief of Baseball Prospectus. He asked if I’d like to write for them. I said yes, reasoning that I’d be doing largely the same thing but getting paid a small amount for it.

I wanted to keep the same bi-weekly to weekly schedule, but again, I had no particular ambition to make a career out of my baseball writing. Frankly, I figured I’d be a spectacular failure, which is how I enter a lot of situations. Imagining that I’d be belly-flopping anyway, I decided to be bold in doing so, and try to take on topics too big or complex for others to attempt. I asked Ben if I could name my column Moonshot, partially as a sarcastic joke on myself, and partially in reference to the other, baseball meaning of the word.

I found myself loving the work at Baseball Prospectus. Instead of largely speaking into the echo chamber of my blog (or the broader, but still depressingly empty world of science), I was getting feedback—not only from Ben and the other wonderful writer/researchers at BP, but from the internet at large. (A paradox of internet writing tends to be that the fewer the people reading your work, the greater the percentage of the feedback that is positive.)

After a handful of articles, I was contacted by an MLB team about doing consulting, an opportunity at which I jumped. That seemed to legitimize my efforts, and so for the first time I started considering baseball-writing related careers, instead of just the default academic science path. With that said, I explored team-related opportunities and found them wanting. And with no immediate prospects at other media outlets, I tabled that prospect.

One year into my work at Baseball Prospectus, Nate Silver contacted me about the open baseball writer position at FiveThirtyEight. Apparently Ben, who had recently joined Grantland, had recommended me. I didn’t apply; I hadn’t considered myself good enough to have a shot. But following a few phone conversations, I had a contract offer from ESPN for part-time work that would be about the same time commitment as what I managed at BP.

At this point, I started to consider the idea of a career as a writer more seriously. The contract offer took place as I was finishing my PhD, in the last year of it. For those who have survived graduate school, you’ll recognize this chapter as the toughest time. (The difficulty was further compounded by working two jobs, as well as some changes in my personal life.) For me and for most people, this period is mostly about the part of graduate school I liked the least: finishing papers, pleasing faculty members, and lining up some kind of post-graduate opportunity to show that you are ready to receive your PhD.

Partially as a result of the misery of finishing my PhD and partially because it was the natural continuation of a longer arc in my life, I started scheming toward a career in journalism instead of science. I succeeded in getting a temporary postdoctoral fellowship while I made up my mind and surveyed my options. By the time I had to make a decision about going on to another fellowship, I was completely certain and ready for a change. I renewed my contract with FiveThirtyEight, quit my postdoc, and set about freelancing to build a more full journalistic resume.

I don’t know that there are any major lessons to take from my career path except: be extremely lucky. I certainly was: I owe my whole career to Ben Lindbergh finding my site on a list of Google results. What are the odds?

To the extent that there is a lesson, I’d say it’s that you should start a blog. This is the advice, puny as it is, that I give to nearly everyone. When I was still blogging anonymously, I thought of each piece as a lottery ticket. Like the lottery, the expected payoff on any given ticket is negative, but you can maximize your chances by getting more tickets. Most of my blog posts went basically unvisited, or perhaps by a handful of Facebook friends. Once I got to the front page of Reddit, but it never went any further beyond that.

The winning ticket, so to speak, was a random piece on baseball player injuries that happened to be posted around the same time that Ben was researching one of his articles on the same subject. I doubt it was the best piece on the blog at the time; I’m ashamed at the quality of it now. But it was Good Enough to lead to the next opportunity, which took me to the next opportunity, and so on. I can’t claim much credit for the progression except insofar as I was doggedly persistent in continuing to write on a regular schedule. The rest was good fortune.