In my last post, I
implored Brian Cashman to stop signing high strikeout pitchers from the NL.
Then I showed a couple of graphs to argue my point, with a big "small
sample size" disclaimer. Now, I'm going to be a bit more rigorous.
Let me review the problem.
Let me review the problem.
- NL
starters strike out more batters than AL starters, but not by much
- NL starters who lead the league in strikeouts have higher totals
than AL leaders, by a larger margin
- NL starters who move to the AL tend to have large drops in their
strikeout rates
There
is no paradox here, but the statements don't seem to mesh very well. Only the
first of these facts is reflected in league-based adjustments to
ERA approximations like FIP, xFIP and QERA. So by any of these
measures, high strikeout starters in the NL are move valuable than high
strikeout starters in the AL.
Fastballs and Strikeout Rates
When I wrote about predicting strikeout rates from pitch data, I showed that, all other pitch data
being equal, switching leagues (AL to NL) is worth between +0.4 K/9 and +0.8 K/9, depending on a starting
pitcher's fastball velocity. Hard-throwing starters benefit from a move to the
NL more than soft tossers benefit.
For
all starting pitcher seasons (100+ IP), let's map average fastball velocity to
strikeout rates. Since the relationship is not linear, I fit both leagues' data
to quadratic functions:
As you can see, the projected NL strikeout rate is always higher than the projected AL strikeout rate for each pitcher, with the differences ranging from +0.4 K/9 to +0.8 K/9 or so.
As you can see, the projected NL strikeout rate is always higher than the projected AL strikeout rate for each pitcher, with the differences ranging from +0.4 K/9 to +0.8 K/9 or so.
The differences in distributions for strikeout rates and fastball velocity can be summarized as such:
mean:
|
median:
|
25th percentile:
|
75th percentile:
|
|
SO9
NL:
|
6.51
|
6.27
|
5.31
|
7.62
|
SO9
AL:
|
6.12
|
5.93
|
4.95
|
7.13
|
FB
vel NL:
|
89.7
|
89.8
|
88.1
|
91.5
|
FB
vel AL:
|
89.9
|
90.2
|
88.5
|
91.9
|
At this point, we could quit with the following explanation:
- Strikeouts are harder to get in the NL:
- therefore NL starters have higher strikeout rates
- and therefore AL starters throw a little bit harder, on average
However, the simple
analysis ignores another crucial factor in strikeout rates: being left-handed.
I had a friend ask me: how could lefties have higher strikeout rates than
righties? Don't most batter bat right-handed, and thus prefer to see lefties? I
can't answer that question directly, but there is no question that lefty
starters strike out more batters than righties, given the same fastball
velocity. This is why about 30% of all starting pitchers are left-handed, even
though the population is only about 10% left handed.
To
show how huge this advantage is, I trained separate piecewise-linear models for
strikeout rates in the AL and NL, based only on FB_vel and handedness. You can
see the full text representation of the models here. As in the more detailed models that I trained earlier, there are 2-3 linear rules, with the
rules split by fastball velocity. The NL model has two rules, but the AL model
has a third rule for pitchers with average fastballs under 88.5 mph. Let's
ignore that low-end rule, and compare the other rules in a chart.
FB range:
|
FB_vel feature:
|
THROWS=L:
|
base case (90.0 mph rightie):
|
|
NL
rule 1:
|
[91.1,
+∞)
|
+0.43
|
+1.71
|
6.46
|
NL
rule 2:
|
(-∞,
91.0]
|
+0.21
|
+0.48
|
6.16
|
AL
rule 1:
|
[90.6, +∞)
|
+0.45
|
+1.32
|
5.60
|
AL
rule 2:
|
[88.5,
90.5]
|
+0.00
|
+0.47
|
5.64
|
As
you can see, being a hard-throwing lefty increases strikeout rates in either
league. But it's a little more valuable in the NL.
Now that we have predictions for strikeout rates from FB speed and handedness, we can do something more interesting: we can compare strikeout rates between leagues, and to guess how a pitcher might have performed in the the opposite league.
Now that we have predictions for strikeout rates from FB speed and handedness, we can do something more interesting: we can compare strikeout rates between leagues, and to guess how a pitcher might have performed in the the opposite league.
Comparing strikeout rates between leagues
My
method is simple: take a starter's actual SO9, and subtract the projected SO9
for his appropriate league, using the above model. Now we have his over or
under achievement for his strikeout rate. Call that his "skill
factor," beyond just throwing hard or being a lefty. To project how he
might have fared in the other league, we add that over (or under) achievement
to his league-based projection for the other league.
For example, Randy Johnson struck out 10.62 K/9 in 2004 for the Diamondbacks. That translates to 9.43 K/9 in the AL. When he moved to the Yankees in 2005, Johnson posted 8.42 K/9. His 2205 translation for the NL: 9.62 K/9. So his K/9 dropped off year to year, but due as much to league change as to declining performance. His "skill factor" dropped from +0.85 to +0.29 from 2004 to 2005 and his fastball velocity dropped by 1.0 mph, but the league change had a major effect on his strikeout rate.
I have a spreadsheet of all starters here, along with strikeout rate projections for both leagues. Yes, it's a huge spreadsheet with lots of columns. Sorry. I'm going to mention a few trends that I found interesting. (I excluded Tim Wakefield & Steve Sparks. My projections really don't make sense for knuckleballers.)
I ranked the pitcher seasons by projected (or actual) strikeout rates by AL standards. So all AL starters are ranked by actual strikeout rates, while NL starters are ranked by their "skill factor" adjusted to the the AL.
Top billings go to Erik Bedard (vintage 2007) and Pedro Martinez (vintage 2002). Both struck out almost 11 per nine innings in the AL. Unfortunately, the data goes back only to 2002, so I'm missing Randy Johnson's best seasons, as well as Pedro's ridiculous totals from 1999 and 2000. If we had included Pedro's 1999 season, he would have won the AL-rules contest by almost two full strikeouts per game. He was that good.
Also you might notice that the top strikeout pitchers tend to be disproportionally left-handed. About half of the highest-strikeout guys are lefties, although only 30% of all starters are lefties. This is not just an NL phenomenon. There is no clear preference for NL pitchers among the highest strikeout rates in MLB, even if we project those NL pitchers to AL baselines.
As you might also notice, the AL-based strikeout rates are much lower for the top guys than are the NL-based strikeout rates for those same guys. As you can see from the "NL - AL proj" column, I have the baseline projections (based on fastball speed and handedness) range from -0.46 K/9 to +1.22 K/9 . To see the trends for these values, let's run a 25-point average, by ordering the pitcher seasons according to AL-based strikeout rates:
There is a lot of variance, but top-strikeout guys are projected to have a larger advantage in the NL than the average starting pitcher.
Even if you do not believe my models for projecting strikeout rates from FB_velocity and handedness, it's pretty clear that high-strikeout pitchers tend to be more dominant in the NL than a simple league-average based adjustment would predict. The average starter strikes out about +0.6 K/9 in the NL, compared to the AL. However, a high-strikeout starter strikes out almost +1.0 K/9 in the NL (or certainly +0.8 K/9). Since the "average strikeout pitcher" includes high-strikeout pitchers, the difference for mid-level strikeout pitcher is even more apparent.
It's also interesting, though perhaps less significant, that the league-based strikeout differentials increase a little bit for the really low-end strikeout pitchers. As I mentioned in the previous article, some low-strikeout pitchers benefit from a move the the NL. Typically what happens is that a hard-throwing starter in the AL will under-perform his strikeout project projections, and then he might do a lot better with a move to the NL. That sounds like good news for Chien-Ming Wang, and also for Carlos Silva. Although I'm not sure if Silva throws hard enough any more to be effective in any league.
For example, Randy Johnson struck out 10.62 K/9 in 2004 for the Diamondbacks. That translates to 9.43 K/9 in the AL. When he moved to the Yankees in 2005, Johnson posted 8.42 K/9. His 2205 translation for the NL: 9.62 K/9. So his K/9 dropped off year to year, but due as much to league change as to declining performance. His "skill factor" dropped from +0.85 to +0.29 from 2004 to 2005 and his fastball velocity dropped by 1.0 mph, but the league change had a major effect on his strikeout rate.
I have a spreadsheet of all starters here, along with strikeout rate projections for both leagues. Yes, it's a huge spreadsheet with lots of columns. Sorry. I'm going to mention a few trends that I found interesting. (I excluded Tim Wakefield & Steve Sparks. My projections really don't make sense for knuckleballers.)
I ranked the pitcher seasons by projected (or actual) strikeout rates by AL standards. So all AL starters are ranked by actual strikeout rates, while NL starters are ranked by their "skill factor" adjusted to the the AL.
Top billings go to Erik Bedard (vintage 2007) and Pedro Martinez (vintage 2002). Both struck out almost 11 per nine innings in the AL. Unfortunately, the data goes back only to 2002, so I'm missing Randy Johnson's best seasons, as well as Pedro's ridiculous totals from 1999 and 2000. If we had included Pedro's 1999 season, he would have won the AL-rules contest by almost two full strikeouts per game. He was that good.
Also you might notice that the top strikeout pitchers tend to be disproportionally left-handed. About half of the highest-strikeout guys are lefties, although only 30% of all starters are lefties. This is not just an NL phenomenon. There is no clear preference for NL pitchers among the highest strikeout rates in MLB, even if we project those NL pitchers to AL baselines.
As you might also notice, the AL-based strikeout rates are much lower for the top guys than are the NL-based strikeout rates for those same guys. As you can see from the "NL - AL proj" column, I have the baseline projections (based on fastball speed and handedness) range from -0.46 K/9 to +1.22 K/9 . To see the trends for these values, let's run a 25-point average, by ordering the pitcher seasons according to AL-based strikeout rates:
There is a lot of variance, but top-strikeout guys are projected to have a larger advantage in the NL than the average starting pitcher.
Even if you do not believe my models for projecting strikeout rates from FB_velocity and handedness, it's pretty clear that high-strikeout pitchers tend to be more dominant in the NL than a simple league-average based adjustment would predict. The average starter strikes out about +0.6 K/9 in the NL, compared to the AL. However, a high-strikeout starter strikes out almost +1.0 K/9 in the NL (or certainly +0.8 K/9). Since the "average strikeout pitcher" includes high-strikeout pitchers, the difference for mid-level strikeout pitcher is even more apparent.
It's also interesting, though perhaps less significant, that the league-based strikeout differentials increase a little bit for the really low-end strikeout pitchers. As I mentioned in the previous article, some low-strikeout pitchers benefit from a move the the NL. Typically what happens is that a hard-throwing starter in the AL will under-perform his strikeout project projections, and then he might do a lot better with a move to the NL. That sounds like good news for Chien-Ming Wang, and also for Carlos Silva. Although I'm not sure if Silva throws hard enough any more to be effective in any league.
Summary
If
you believe my methods and are not overly worried about training models on
somewhat small samples (more discussion of sample sizes later), then you can
use the models directly to predict the kind of adjustments that need to be made
in order to compare AL & NL strikeout rates on the same scale.
Alternatively, you can look at specific examples of pitcher seasons, and see
what kind of adjustment had to be applied in those cases. Here are a few
typical cases:
- Hard-throwing lefty (Bedard '07, Santana '04, etc): +1.2 K/9
- Hard-throwing righty (Schilling '03, Burnett '07, etc): +0.8 K/9
- League-average lefty (Cliff Lee '04, Pettitte '06, etc): +0.5 K/9
- League-average righty (Pedro '02, Mussina '03, etc): +0.4 K/9
If
you do not buy the whole "lefties have disproportionately more success in
the NL" theory, then you can simply use the RHP figures above, or refer to
the trailing averages graph above.
If
anything, I think I've showed that high-strikeout pitcher do tend to do much
better in the NL than simple average-based projections might suggest.
Therefore, those pitchers will have their FIP, xFIP, and QERA (and probably
SIERA as well) suffer by moving to the American League (while league-average
strikeout pitchers will not). As a result, their VORP, WAR, or any other
measure of value, will also drop with a move to the AL.
Does
this mean that we should change FIP? I don't think so. The point of FIP is to
measure the "run-saving value" of each pitcher. Top strikeout pitchers
really do provide more value to NL clubs by striking out more hitters. We
should not penalize them because they would not have done as well in the AL.
However, projection systems and GMs need to be aware of the fact that NL-based
high-strikeout pitchers are not as valuable in the AL, and thus not overpay for
production that they will not be getting.
Sample sizes
Although the samples I
am using here are not as small as those for the original article, I'm still basing this analysis on only 1,000 pitcher seasons, of
which only 300 are for lefties. Worse yet, I am training two models (AL and
NL), for a total of 5 rules (each with 3 variables). However, this is all the
data that I've got. Even if I had reliable fastball data going back 20 years,
I'd be looking at stats from a different game.
Rather than argue whether or not all of this is statistically significant, let's perform an experiment. I've repeatedly mentioned Randy Johnson. Then I projected his stats, based on adjustments that, among other seasons, included Randy Johnson seasons. Doesn't sound kosher. Let's remove Randy Johnson from the sample, and train a new NL model:
Rather than argue whether or not all of this is statistically significant, let's perform an experiment. I've repeatedly mentioned Randy Johnson. Then I projected his stats, based on adjustments that, among other seasons, included Randy Johnson seasons. Doesn't sound kosher. Let's remove Randy Johnson from the sample, and train a new NL model:
FB range:
|
FB_vel feature:
|
THROWS=L:
|
base case (90.0 mph righty):
|
|
NL
rule 1:
|
[91.1,
+∞)
|
+0.43
|
+1.71
|
6.46
|
NL
rule 2:
|
(-∞,
91.0]
|
+0.21
|
+0.48
|
6.16
|
NL
rule 1 (no RJ):
|
[91.1,
+∞)
|
+0.41
|
+1.49
|
6.52
|
NL
rule 2 (no RJ):
|
(-∞,
91.0]
|
+0.21
|
+0.46
|
6.16
|
That's right, I just
referred to the Big Unit at RJ.
As you can see, removing Johnson's stats is not insignificant for the NL strikeout model. Not surprisingly, the new model (for hard-throwing NL starters) gives less weight to fastball speed, and also less weight to being left-handed. If we use this new model, hard throwing lefties only get a +1.0 K/9 bonus for the NL, while hard-throwing righties stay at +0.8 K/9.
OK, so the model is sensitive. Then again, we just selectively removed the most dominant lefty NL pitcher from our data set! I'd rather have the complete data, and to accept that all of my conclusions come with error bars.
As you can see, removing Johnson's stats is not insignificant for the NL strikeout model. Not surprisingly, the new model (for hard-throwing NL starters) gives less weight to fastball speed, and also less weight to being left-handed. If we use this new model, hard throwing lefties only get a +1.0 K/9 bonus for the NL, while hard-throwing righties stay at +0.8 K/9.
OK, so the model is sensitive. Then again, we just selectively removed the most dominant lefty NL pitcher from our data set! I'd rather have the complete data, and to accept that all of my conclusions come with error bars.
No comments:
Post a Comment