The core of baseball is the duel between pitcher and batter. The pitcher’s job is to throw a baseball past the batter, or otherwise induce the batter to make an out (via a weakly hit ball, for instance). The batter’s job is the opposite: to make solid contact with the ball, transferring enough force to either cause the batted ball to be difficult to field (e.g. a line drive) or impossible to field (a home run).

An interesting aspect of this battle between pitcher and batter is the fact that the batter can, in theory, hit any pitch within the strike zone (at the major league level). It is straightforward physics to reason that, given enough information, a batter can make good contact with any strike, and indeed, MLB hitters can get hits from even 100+ mph fastballs (albeit rarely).

The fastest pitches still take about four-tenths of a second to go from hand to plate, which is more than enough time to swing. I would argue that the crucial weapon the pitcher has in his arsenal isn’t speed, it’s uncertainty. If a pitcher can cause a batter to adjust their swing mid-cut, or not swing at all, that is a vastly more powerful advantage than the milliseconds that can be chopped off by simply *throwing harder*.

For the remainder of this post, I’ll investigate how pitchers utilize uncertainty in their pitch sequences. MGL has hypothesized (or perhaps has data to show?) that pitchers optimize their sequencing of pitches such that the identity of the first pitch in a sequence gives no information as to the identity of the next pitch. That is to say, conditional on the first pitch being, say, a fastball, the batter has no more idea of what the next pitch will be than he did before the first pitch.

# (Re-)Introducing Entropy

While unpredictability in pitch type is an important weapon in the pitcher’s arsenal, it’s not clear how to quantify that uncertainty. For the duration of this post, I’ll borrow a metric from information theory, entropy (for more details on its calculation, see below). Entropy is a measure of uncertainty, and it can be thought of as the number of yes/no questions required to determine the identity of an unknown quantity. In baseball terms, a pitcher whose arsenal is employed with great entropy prevents the batter from guessing the next pitch effectively. Conversely, a pitcher who throws less entropically allows the batter to predict the identity of the next pitch more easily.

Using Pitch F/X data from Baseball Savant, I investigated patterns of pitch type entropy in two pitchers (an important caveat to note is that I’m trusting Pitch F/X’s pitch type classifications). Let’s start with one Clayton Kershaw, he of the the recently signed $215 million dollar extension.

Kershaw employs four pitches.

Pitch Type | Fastball | Slider | Curve | Changeup |
---|---|---|---|---|

Number of pitches | 2237 | 920 | 464 | 85 |

Frequency | 0.60361576 | 0.24824609 | 0.12520237 | 0.02293578 |

Generally, Kershaw uses his fastball most of the time, which is just fine because he averages a solid 92.6 mph on it. When he’s not using his fastball, however, he has excellent breaking balls, most especially his trademark curve. Thinking about it from the batter’s perspective, the most important question might be: fastball or breaking ball? If a fastball, the batter has precious little time to react, but knows that there won’t be too much motion. If a breaking ball, the batter will have slightly more time, but has to factor in the horizontal break on the ball in his decision when and where to swing.

Doing the math, one finds that the total entropy of Kershaw’s repertoire is almost exactly 1 bit (.997 bits). Remembering that entropy can be thought of as the number of yes or no questions required to elucidate the type of pitch, we can sort of imagine that yes or no question being “Is it a fastball?” (I’m sweeping some of the mathematical details under the carpet here, but if you’re interested there’s more explanation below.)

Because the maximal entropy possible with 4 pitches is exactly 2 bits (which would be achieved if each pitch was used exactly 25% of the time), we can see that Kershaw is approximately half as entropic as he could be. That doesn’t necessarily disprove MGL’s theory, because not all pitches are equally effective. To figure out how often to use each pitch, Kershaw must also consider the fact that his fastball is lethal, and his curve devastating, while his changeup is merely adequate. In more mathematical terms, Kershaw is really optimizing along two dimensions: the first is pitch quality, and the second is overall entropy. The maximal entropy distribution need not be the best to get hitters out.

# The Entropy of Sequences

Let’s now engage with sequences of pitches. As I noted above, under conditions of optimal pitch sequencing, the first pitch ought to provide no information at all concerning the next pitch. With conditional entropy, we can rigorously quantify this possibility. Conditional entropy is as it sounds: the entropy of the next pitch, *conditional* upon knowing the first pitch in a sequence. If Kershaw manages his sequencing optimally, the conditional entropy ought to be nearly equal to the unconditional entropy.

I pulled out all 2-pitch sequences in Kershaw’s PitchF/X data. All 16 possible combinations occurred (too many to display well); I reproduce here the set beginning with fastballs.

Pitch Type | Fastball | Slider | Curveball | Changeup |
---|---|---|---|---|

Pitch Frequency | 504 | 251 | 177 | 15 |

First, I must note that by limiting it to two-pitch sequences, the numbers of and thus the entropy of Kershaw’s pitches changes slightly (one-pitch at-bats and such are eliminated); it becomes 1.087193. But more interestingly, calculating the conditional entropy for each possible beginning pitch results in this:

Pitch Sequence Starts With (Conditioned On) | Conditional Entropy |
---|---|

Fastball | 1.066745 |

Slider | 1.099957 |

Curve | 1.068736 |

Change | 1.103984 |

Remembering that more entropy = less predictability, we see that Kershaw’s conditional entropy is only very, very slightly lower than his unconditional entropy, and certainly not significantly different.

The takeaway here is that MGL is right; Kershaw sequences his pitches in such a manner that the batter, knowing the first pitch, has no more information about the next pitch.

# Kershaw Isn’t Unique

The initial comment that may come to mind is that Kershaw is special. After all, he’s the reigning Cy Young winner and arguably one of, if not the, best pitcher(s) in baseball. Perhaps lesser pitchers are worse at managing their entropy, resulting in the batter being able to predict the next pitch and capitalize on that knowledge.

To go to the opposite extreme, I selected one Joe Saunders, a decidedly lesser pitcher with a similarly large number of pitches (sorry, Joe…). I ran him through the same set of analyses. Mr. Saunders is a rather more entropic pitcher at 1.443 bits, which may have to do with the fact that he has worse “stuff” than Kershaw (an idea to investigate further in future posts). Nevertheless, Saunders doesn’t come close to maximizing the entropy he could possibly achieve with his 5-pitch arsenal.

And yet, despite his greater entropy and worse results, his conditional entropy nearly matches his unconditional entropy, just as we saw before in Kershaw’s case.

Thus we see that Kershaw, one of the very best pitchers in MLB, is similar to Saunders with respect to the entropy of his pitch sequencing, even though Saunders is decidedly… ahem… *not* one of the best pitchers in MLB.

# Entropy as Tool

I’ll wind up with a couple of conclusions. The first is that hurlers don’t come close to maximizing the entropy of their pitch types. Given how widely a thrower’s pitches may vary in quality, this pattern is perhaps unsurprising. The second is that sequences of two pitches, at least for this limited sample of pitchers, are effectively as entropic as possible. This pattern confirms that for a batter, knowing the first pitch helps not one bit in guessing the next pitch, and effectively reaffirms MGL’s excellent prediction.

Which is all fine, well, and good; but for my part, I’m more excited about the prospects of using entropy in future studies. There’s a whole host of questions still to be answered, for which entropy seems an ideal tool: from the mysteries of longer sequences (3 pitch sequences already begin to show interesting trends) to the differences between pitchers, entropy is a promising step towards understanding how pitchers vary speed, spin, and location to disrupt hitters.

## Appendix: A Brief Calculation of Entropy

Entropy is one of the most important and mysterious quantities in science, and plays pivotal roles in both computer science and physics (not to mention statistics). It can be thought of in the following way. Imagine a pitcher with two pitches: fastball and changeup. Imagine that the two pitches are equally effective. The formula for said hurler’s entropy would be:

Entropy of Imaginary Pitcher = -1 *(probability of fastball x logarithm(probability of fastball) +probability of changeup x logarithm(probability of changeup) )

It is straightforward to show that the entropy is maximized when the pitcher divides his pitch frequency evenly, so that he pitches exactly 50% fastballs, and 50% changeups. In the case of two pitches, the maximum entropy is 1 bit.

The more general formula is…

Let’s break this down into words, working left to right (H(X) is simply a way to denote entropy). First, the squiggly sign stands for the sum, which just means that we repeat the calculation for each of the categories in our set. For pitch types, we can think of these as being the different varieties of pitches, where n is the total number of different types. Going right once more, we see p(x_{i}), which is the probability of the *i*th pitch. We then multiply it by the log of that same probability. For my purposes, not knowing the true probability of e.g. a fastball, I consider the observed frequency of fastballs in the sample divided by the total number of pitches as an estimate of that probability. We then repeat this calculation for each of the pitch types, sum the result, and multiply by -1; voila, entropy. As above, the maximal entropy is achieved when each of n pitches is thrown with probability 1/n.

Entropy’s pretty complex, so for more thorough (and eloquent) treatments, I’d recommend these links.