My writings about baseball, with a strong statistical & machine learning slant.

Tuesday, March 16, 2010

Excessive exuberance for young pitchers?

I don't like to criticize other peoples' work, but it seems to me that most projection systems for pitchers have a major flaw: they way over-estimate the IP (innings pitched) for young pitchers.

Consider a young pitcher that most of you have heard of. According to FanGraphs, here are the 2010 IP projections for Phil Hughes:

Bill James

These numbers look reasonable, right? Hughes is a strong candidate for the Yankees #5 starter role after a solid minor league career and a good season in relief for the Bombers last year. And yet, what is a reasonable high-end estimate for Hughes’s IP this season? Fifth starters never throw 200 IP, and in any case, I doubt that the Yankees would work Hughes that hard even if they need to move him up in the rotation. Also note that he has never threw more than 160 innings as a professional.  A reasonable high-end projection for him would probably be 140-160 IP, if everything goes well. And what if it does not?

It’s hard to estimate a low-end case for Phil Hughes, but the lower bound for any pitcher is 0 IP. Hughes does not have a bad injury history, but all pitchers get hurt sometimes. Also, he might struggle and lose his rotation spot (assuming he is a starting pitcher this year).

I’m not saying that 120 IP is a bad estimate for Phil Hughes, but it’s a more optimistic projection that is might seem at first.

Some of you may disagree, and frankly I’m not sure that this particular projection is a bad one. What I am saying is that, collectively, IP projections for young pitchers are higher than is consistent with what really happens in the majors.

Consider CHONE IP predictions for all Yankees pitchers. Here are the starters, not including the top four (Sabathia, Vazquez, Pettitte & Burnett):

Projected IP:
Phil Hughes
Chad Gaudin
Dustin Moseley
Sergio Mitre

Does anyone actually think that these four pitcher will collectively pitch 480 innings? There is no chance!

This problem is not an isolated case for the Yankees. Pretty much every team’s pitchers’ IP are over-estimated by CHONE, even if you count only pitchers who are sure to be on the Major League roster. The Orioles are projected to have seven pitchers with at least 18 starts and 100 IP each. Even if the season had enough games for 200 starts, there is not much chance that these exact seven pitchers will get all of those starts.

I think that this is the key to understanding why projections for young pitchers are historically too high (I will show just how high they are later). If you look at individual projections for young pitchers, they may seem reasonable, but collectively, they just don’t add up. This is because we really don’t know which young pitcher is going to get playing time, so we should severely discount the projections for all pitchers who are not established major leaguers. The IP projections for all pitchers on a team’s Major League roster should add up to significantly less than the total innings for that season. Instead, all the projection systems that I have looked at tend to have those stats add up to more than the total innings for that season.

Phil Hughes is a good pitcher, and will not get demoted quickly. I think projecting 120 IP for him this year is a bit high, but it’s not a unreasonable 50th percentile projection. However, CHONE also projects 56 IP for Zack Segovia. Zack who? Exactly.  Might he pitch 56 innings for the Nationals this year? Maybe, but that is not a reasonable 50th percentile projection, at least not for an impartial computer system.

I don’t mean to pick on CHONE. PECOTA has the same flaw. However PECOTA data is not public. I’ve heard that OLIVER has good projections, but I don’t think that those are available publicly either.

I don’t know how CHONE or PECOTA are trained, but I don’t think they take proper account of how innings are distributed between veterans and rookies. Over the past five years (2005-2009), here is how innings have been divvied up:
  • 640 pitchers per season
  • 226 pitchers qualify as rookies (less than 50 IP previously)
  • 67.7 innings per pitcher
  • 37.6 innings per rookie pitcher
  • 66% of innings are thrown by starters
  • 20% of innings are thrown by rookies
The short version is that, in today’s game, most innings go to experienced pitchers, especially in the starting rotation. Occasionally, there are pitchers like Tim Lincecum, Rich Porcello and Dontrelle Willis who rack up a large IP tallies as rookies. It’s nice if your projection system finds those guys. But in the meantime, that system will probably over-estimate playing time for many other good pitchers, who will struggle in the majors, or who will see their rotation spot or roster go to a veteran.

Both PECOTA and CHONE are very good projection systems, and they both take minor league stats into account. But I would argue that they would be better off with unsophisticated, lowball IP estimates for all young pitchers, until those pitchers establish major league records worth projecting off of.

In my next post, I will demonstrate such a system.

No comments:

Post a Comment