My writings about baseball, with a strong statistical & machine learning slant.

Sunday, December 27, 2009

Pitcher repertoire depth: a measure of "how many pitches" he throws.

In my K9 estimates, I use two statistics that I call "rep_depth" and "rep_offerings." Let me explain what this is all about.

Leading up the this year's NL Cy Young voting, I was reading some discussions about whether Tim Lincecum or Adam Wainwright has a deeper arsenal of pitches. The implied assumption seemed to be that throwing only two pitches regularly is a bad thing. Those taking part in the discussion threw around nuggets like "Lincecum throws 4 pitches at least 5% of the time" or other such ad hoc stats using arbitrary cutoffs. I thought I'd try to come up with something more systematic.

FWIW, Lincecum's and Wainwright's pitch frequency breakdowns can be seen on FanGraphs. It is clear to me that Wainwright has a deeper arsenal (although not by much). It's not clear how much this matters as far as effectiveness goes, but the point here is to apply numbers to a concept that people use, not so evaluate the utility of the concept itself.

Most pitchers throw anywhere from 1 to 6 pitches (with non-trivial frequency) according to FanGraphs pitch data. Someone who throws two pitches 50-50 is a two pitch pitcher. Easy enough. However, I think that most people would say that someone who throws two pitches at a 90-10 ratio is a one-pitch pitcher. Similarly for a 80-10-10 pitcher. However someone who throws three pitches in a 60-20-20 ratio is a three pitch pitcher. Also someone who throws two pitches at a 70-30 ration is a two pitch pitcher, but still has a less balanced repertoire than the hypothetical 50-50 pitcher.

This brings to mind something involving the harmonic mean. The harmonic mean of 50 and 50 is 50. The harmonic mean of 70 and 30 is 42. The harmonic mean of 90 and 10 is 18. Of the three means (arithmetic, geometric and harmonic), the harmonic mean is always the smallest, and thus seems like a good candidate for "rewarding" pitchers with a highly balanced repertoire.

Conceptually, if a pitcher throws one pitch much more often than his other offerings, the hitter need only to look for that one pitch. However as a pitcher spreads his offerings more widely, a hitter has to look for several pitches, even those only thrown 10-15% of the time. Or at least, that's the idea.

Considering n pitches, the harmonic mean of their frequencies is expressed as:

where p_m is the frequency of pitch number m, expressed as a probability (between 0 and 1). Therefore, the largest possible value for p_2 would be 0.5, the largest possible value for p_3 would be 0.33, and so forth.

In order to convert the harmonic mean back to a number between 1 and 6, we multiply the harmonic mean by n^2. The first time, we multiply by n in order to get a number that is at most 1.0. The second time, we multiply by n in order to get a number that is at most n. Thus:

That's it. Also I remember this best value of n and call it the "repertoire offerings," as the measure of how many pitches are in the pitcher's repertoire (ie how many pitches a hitter has to consider), expressed as a whole number.

For example, in 2009, Tim Lincecum ends up with a rep_depth of 2.34 off of 3 offerings. Adam Wainwright ends up with a rep_depth of 2.55 off of 3 offerings. Among well-known staring pitchers in 2009, Dan Haren had the highest rep_depth at 3.24 off of 4 offerings. As seen here, Haren threw four pitches at least 13% of the time each (fastball, cutter, curve and splitter). However that kind of depth is unusual. (Actually I just noticed that James Shields has an even higher rep_depth at 3.41. But you can argue that he is less famous than Dan Haren?)

I have included a full list of pitcher seasons (2002-2009) and repertoire depths here. The cutoff is for 60IP+, and a small amount of data from 2002 is missing, but otherwise this is a complete list.

Before I go on, I must mention that I made a small modification to the formula shown above. If we want to compute the rep_depth at 1 (ie just consider one pitch), it makes no sense to derive an answer other than 1.0. Also, if a pitcher throws his pitches at the rate of 45-45-5-5, he should end up with rep_depth of 1.0 at 1 offering, and with a rep_depth 2.0 at 2 offerings. Therefore, I normalize any set of pitches before computing the rep_depth. Thus, rep_depth is capped at 1.0 at 1 offering on the low side for everyone (for example, Mariano Rivera of the last few years).

Also, there were some nasty cases where a pitcher might end up with 1.002 rep_depth off 5 offerings. I think my formula is flawed in such cases, so I revert to 1.0 off of 1 offering, if my formula yields a rep_depth below 1.1. This is a hack, but it rarely come into play, and I think it makes sense.

It is not immediately clear whether having a high repertoire depth is always a good thing. There are some pretty good pitchers who have low repertoire depths, and some mediocre pitchers (Adam Eaton of 2008, anyone) who threq a lot of different pitches, and yet didn't do too well with any of them.

If we list only pitchers with 20+ VORP, there is no obvious pattern among the top performers, in regard to repertoire depth. Then again, the point here is to summarize pitch data, rather than to draw immediate conclusions about predictable performance.

The average rep_depth is right around 2.0, with average rep_offerings right around 3.0. So we can confidently say that your typical pitcher throws three pitches, but not with an uneven distribution. There is no inherent advantage to throwing more pitches, although there might be an advantage to throwing those same three pitches with a more even distribution. Power pitchers with great fastballs can often get away with throwing only two pitches (the other usually being a breaking pitch). Pitchers with lesser fastballs usually need a third offering, be that a cutter, splitter or changeup.

Going back to the K9 projections I wrote about previously, my formula punishes (in terms of an expected strikeout rate) high values for both rep_depth and for rep_offerings. However, the system awards points for throwing particular pitches a high percentage of the time, namely fastballs and breaking pitches for power pitches, and slow changeups for pitchers with slower fastballs. Makes sense to me. Universally, the system expect pitchers who throw lots of different pitches to have low strikeout rates, all other things being equal. This is a bit surprising, but not illogical. A pitcher with a great fastball (or cutter, in the case of Mariano Rivera) needs not throw much else. These kind of pitchers can be highly effective, and they record high strikeout rates. Pitchers with great primary weapons don't need more than one secondary offering. Although as the linked chart shows, many pitchers (especially older, more experienced pitchers) have had great seasons throwing a variety of pitches. So the tendency to have lower strikeout rates among high rep_depth pitchers might be a case of reverse causality. I'm not really sure.

Among pitcher seasons 2002-2009 (20IP cutoff), the rep_depth and rep_offerings can be bucketed as follows:


3181 elements, 10 buckets --> 318 target average
bucket 1 [1.000000, 1.321785] for 318 elements (1.121944 average)
bucket 2 [1.323485, 1.542642] for 318 elements (1.449034 average)
bucket 3 [1.543158, 1.688595] for 318 elements (1.621933 average)
bucket 4 [1.688927, 1.799741] for 318 elements (1.745801 average)
bucket 5 [1.800337, 1.906526] for 318 elements (1.855347 average)
bucket 6 [1.906688, 2.001738] for 318 elements (1.956757 average)
bucket 7 [2.001881, 2.180508] for 318 elements (2.092596 average)
bucket 8 [2.180602, 2.370741] for 318 elements (2.274430 average)
bucket 9 [2.371444, 2.650319] for 318 elements (2.501306 average)
bucket 10 [2.651505, 4.090890] for 319 elements (2.952992 average)

Created 10 buckets
6226.893148 / 3181 = 1.957527


bucket 1 [1.000000, 2.000000] for 979 elements (1.848825 average)
bucket 2 [3.000000, 3.000000] for 1132 elements (3.000000 average)
bucket 3 [4.000000, 4.000000] for 888 elements (4.000000 average)
bucket 4 [5.000000, 6.000000] for 182 elements (5.065934 average)

Created 4 buckets
9680.000000 / 3181 = 3.043068

Now, at least, it is possible to estimate the depth of a pitcher's repertoire using two numbers, which can be simply computed.

No comments:

Post a Comment