My writings about baseball, with a strong statistical & machine learning slant.

Friday, February 19, 2010

Dear Mr. Cashman, no more NL starters please!

Before the 2005 season, the Yankees acquired Randy Johnson in a trade with the Diamondbacks. Although Johnson was already 41 a the time, he had just come off of a ridiculous six year run with the Snakes. He'd collected four Cy Young awards, and he also finished second once. A big part of his success was due to his nearly 12 strikeouts per 9 innings average (SO9 or K/9) over those six seasons. Johnson had defied regression to the mean. He had posted at least 10.0 K/9 in each of the past 14 seasons. Knowing all this, the Yankees sent Javier Vazquez, a left-handed pitching prospect, Dioner Navarro, and stacks of cash to the Snakes for the best left-handed starting pitcher of his era. (All stats and facts can be found on FanGraphs and Baseball Reference)

In 2005, Johnson wasn't nearly as dominant. He finished 17-8 in 225 2/3 innings with a 3.79 ERA. He was the Yankees' best starting pitcher, but it neither he, nor the Yankees fans, considered his season a success.

At the time, I thought that criticism of Johnson was a bit overblown. A commentator (I forget which one) pointed out that Johnson's numbers (17-8, 225 2/3 IP, 3.79 ERA) were not much worse than those of Bartolo Colon (21-8, 222 2/3 IP, 3.48 ERA), who won the AL Cy Young that year. I figured that if Randy Johnson had been one of the top two or three starting pitchers in the AL that year, then how can we consider his season a failure? Was his rise in ERA bad luck, a response to tougher opposition, or did he finally decline after years of proving statisticians wrong?

Advanced pitching metrics show that Randy Johnson did indeed regress quite a bit in 2005. His FIP (a simple fielding independent ERA approximation) jumped to 3.78 (from an FIP of 2.30 in 2004), after being consistently under 3.00 with the Diamondbacks. A big part of this increase was his loss of 2.20 K/9 from his 2004 season with the Snakes. FIP is computed as follows:

(HR*13 + (BB + HBP - IBB)*3 - K*2) / IP + (annual league constant)

If we ignore the change in league constant, then it's clear that Johnson's loss of 2.20 K/9 resulted in a rise of 0.49 FIP. In other words, FIP predicts that Johnson's ERA would rise by 0.49 from the result of his strikeout rate falling from 10.62 to 8.42 from 2004 to 2005. However FIP does not adjust for park factors, and also takes home run rates at face value. A more advanced version of FIP is xFIP, which takes better account of park factors and the luck involved in HR rates. From 2004 to 2005, Johnson's xFIP rose from 2.60 to 3.42.

Therefore, if we make allowances for the differences in moving from Arizona in the NL to New York in the AL, and we make allowances for Johnson's good luck with home runs in 2004 (and bad luck with home runs in 2005), he still regressed quite a bit. A large part of Johnson's decline can be attributed to his sudden loss of 2.20 K/9. Should this loss of strikeout have been expected by the Yankees front office?

In my grand opus on strikeout rates, I mentioned that it seems like lots of high-strikeout pitcher who move from the NL to the AL have lose about 2.0 K/9 in their first year after the move. Let's take a look at all starting pitchers who've recently switched leagues.

For this mini-study, I define starting pitchers as those pitchers who threw 100+ innings as starters in consecutive seasons. The data is from 2003-2009, so the samples are fairly small (and some samples could be significantly biased by transactional trends for certain teams, ie the Yankees). Therefore, I will not claim that this study proves anything. Nor do I suggest that you should take my numbers at face value.

Here are the before & after strikeout rates for starters who switched leagues, along with trend lines:



Wouldn't you know it? Starters with 10+ K/9 in the NL tend to lose a little over 2.0 K/9 in moving to the AL. Maybe the Yankees (or at least their fans) should have looked at this graph in 2006 before they gave Randy such a hard time.

Also, we've got us two parallel lines. So maybe strikeout changes in switching leagues are symmetrical? Let's look into this a little further.

First, let's note that for any "old SO9" rate, a starter moving to the NL will tend to have 1.5 more K/9 than a starter moving to the AL. This would suggest that there is some loss in moving to the AL and some gain in moving to the NL, with the total adding up to about 1.5 K/9. This is pretty consistent with my findings in regard to league adjustments for predicting strikeout rates from pitch data.

Second, we can also see that the slope of both trend lines is about 2/3, so pitchers lose 1/3 of their strikeouts over 6.0 K/9 (which happens to be roughly the average strikeout rate for starters in both leagues) when they switch leagues, in addition to league change adjustments. Does this have to do with league change also, or would the pitchers have undergone a 1/3 reduction in their marginal strikeout rates if they didn't switch leagues?

We can not know what kind of season Randy Johnson (or Curt Schilling, or Kevin Brown) would have had if they never moved from the NL West to the AL East. Instead, we'll look at two other groups of pitchers. One group switched teams, but stayed in the same league. The other group stayed with the same team. However, we only count pitchers staying with the same team if their age for the second season was 30+. Teams rarely trade (and never release) young pitchers who are full time starters for them, and young pitchers are not eligible for free agency. Therefore we need to compare pitchers switching teams to guys who are a bit older than average. I picked 30 arbitrarily, in order to avoid multiple-endpoints issues for such small sample sizes.

Here are all six classifications of starters below, with trend lines. Again the samples are small (some categories contain as few as 50 pitchers), so we should not place great emphasis on small changes in the slope of the trend lines.



Again, the slopes do differ a little, but we have a fairly clean stacking of the six categories of starters inside the 4.0 K/9 to 10.0 K/9 range where the vast majority of starters make their living.

Consider the several hypothetical fates of an NL starter:
  • If he stays with the same team (blue trend line), then he will experience a strikeout rate decline of 0.0 to just over 0.5 K/9, depending on how high his previous K/9 rate was.
  • If he moves teams within the NL (thin teal trend line), then he will have the exact same pattern of decline.
  • If he moves to the AL (purple tend line), then he will lose between 0.0 K/9 and 1.5 K/9 within the range of reasonable strikeout totals. An average starter (6.0 K/9) will lose just over 0.5 K/9, but a high strikeout starter may lose 1.5 K/9 due to the league change, in addition to the expected 0.5 K/9 natural decline.
Now consider a hypothetical AL starter:
  • If he stays with the same team (red trend line), then he will have a similar decline to the NL starter, except that his decline will be a tiny bit bigger on the high-strikeout end.
  • If he switches teams within the AL (orange trend line), then he will experience further decline of up to 0.5 K/9 on the high-strikeout end.
  • If he switches to the NL (green trend line), then he will gain between 1.0 and 1.5 K/9, relative to having switch teams within the AL.
The most notable points are that:
  • All starters lose strikeouts when they move to the AL.
  • All starters gain strikeouts when they move to the NL.
  • High-strikeout pitchers are particularly susceptible to the drops in strikeout rates when they move to the AL.
  • Low-strikeout pitchers have the most to gain by moving to the NL.
Looking at the graphs above, one might be tempted to assume that starters have much higher strikeout rates in the NL, on average, than they do in the AL. However, this is not the case. Consider average strikeout rates for the six categories above (I use the "new SO9" numbers in all cases):


NL (mean):AL (mean):NL (median):AL (median):
same team:6.045.925.645.56
new team:5.825.645.365.35
switch to league: 6.615.796.295.66

How can this be? If average starters in the NL do not strike out more batters than average AL starters, then how come there is such a huge change in strikeout rates when starters switch leagues?

Does this imply that the AL has more talented pitchers? Maybe, but not necessarily.

Low-strikeout AL pitchers benefit significantly from a league change. Therefore they have a strong incentive to change leagues. Low strikeout NL pitchers lose a further 0.5 K/9 with a move to the AL, so they have little incentive to change league. This would suggest that low-strikeout pitchers (at least by standards of AL ability) will be concentrated in the NL.

Now consider high-strikeout pitchers. High-strikeout pitchers who move to the AL have large drops in strikeout rates (1.5 K/9 on the high end). This is a significant disincentive for them to make the move. However high-strikeout AL pitchers have only a 0.5 or so gain in strikeout rates when moving to the NL. So they have a small incentive to move to the NL, and a large incentive to stay in the NL. This would suggest that high-strikeout pitchers will also be concentrated in the NL!

The math for average and slightly above-average strikeout pitcher is a bit more symmetrical. If a pitcher has 7.0 K/9, he will, on average, gain or lose 0.75 K/9 by switching leagues. His numbers will look better in the NL, but a well-tuned stat that adjusts for average league differences will suggest that his performance has the same value in either league.

Therefore, we should think that the AL will have more slightly above average starters, while the NL will have more high-strikeout starters, but it will also have more low-strikeout starters. That would be the most logical equilibrium.

So far, we have not mentioned why pitchers should have different strikeout rates in the two leagues. Let's assume that the reason is a combination of rule differences, and of lineup difference that result from an adjustment to those rule differences. When the teams enter inter-league play, they will have to play some games by the other league's rules, and against lineups built for those same rules.

If the strikeout rate differences are due to different rules (and lineups designed to adjust for those rules), then NL starters should have a comparative disadvantage relative to their AL brethren. The high-strikeout NL starters will suddenly become much lower-strikeout pitchers. The low-strikeout NL starters will suffer a smaller loss in strikeout rate. However, even if the back of the rotation performs relatively better, that can not compensate for the top NL starters losing 1.0-2.0 K/9 overnight. On the other hand, AL teams, which should have relatively more slightly above average strikeout starters (and fewer high-strikeout starters) will have a more balanced effect on their strikeout rates when facing NL competition. They will all strike out 0.5-1.0 more batters per 9 innings. The run environment will change proportionately, so the AL guys will not become more valuable. However, AL teams will not be subject to the kind of dramatic loss in high-end value that NL teams' high-strikeout starters experience in AL ballparks.

I might be going too far with an argument that is hinged on a fairly small amount of recent data. However, I think it's an argument worth considering. The following facts are hard to dispute:
  • High-strikeout NL starters experience large drop offs in strikeout rate upon moving to the AL.
  • In recent years, the NL has had many more high-strikeout starting pitchers than the AL, despite the fact that there is little difference in average strikeout rate between the leagues.
  • The AL has whipped the NL in inter-league play (in some years by huge margins) ever since this experiment began.
In his book, Whitey Herzog suggested that NL teams had a unfair advantage against AL teams in inter-league play, because their pitchers should be much better hitters. In today's run environment, that may not matter much any more. However in today's high-strikeout environment, I suggest that high-strikeout starting pitchers from the NL are disproportionately hurt by inter-league play, thus giving the AL a significant advantage.

So if you are reading this, Mr. Cashman, please stop signing high-strikeout NL starting pitchers. Instead, keep concentrating on offense, defense, and the bullpen. Keep trying to acquire or develop starters like Andy Pettitte and David Wells. Guys with above average strikeout rates, who keep their walk rates down. I hope that Javier Vaquez bucks the trend and holds on to most of his 9.77 K/9 from last season. However, I would not hold my breath.

I should also reconsider some of my thoughts on this off-season's big transactions. A few months ago, I wrote about the Edwin Jackson trade. I said that Jackson had consistently under-achieved his potential SO9 rate (based on fastball speed and other pitch factors), and that pitchers like him tend to continue under-achieving. I said that despite one good season and a world of talent, Jackson was unlikely to ever achieve high strikeout rates with Arizona. I might have to change my mind about this now. Jackson struck out 6.77 per nine innings last season. His totals should improve in Arizona to about 7.8 K/9. If that happens, Jackson will be a valuable starter, whatever his other shortcomings may be.

In order to get Jackson, the Snakes sent Max Scherzer to the Tigers. He recorded 9.2 K/9 in his first full season as a starter. According to my graph, he should drop down to about 7.5 K/9 next year with the Tigers. This suggests that last season's strikeout rates for the two pitchers are a not as different as they first seemed to me. Even so, Scherzer was clearly the better pitcher last year. Both Jackson and Scherzer are young, and if the Diamondbacks think that Jackson has a higher upside, then they are fully justified in making the trade. In any case, Scherzer is likely to regress (at least in nominal stats) with the Tigers, so the Snakes will look like they sold high on him.

No comments:

Post a Comment