I don't like to criticize other peoples' work, but it seems to me
that most projection systems for pitchers have a major flaw: they way
over-estimate the IP (innings pitched) for young pitchers.
Consider a young pitcher that most of you have heard of. According
to FanGraphs, here are the 2010 IP projections for Phil Hughes:
system:
|
projected:
|
Bill
James
|
121.0
|
CHONE
|
116.0
|
Marcel
|
81.0
|
Fans
|
138.0
|
These numbers look reasonable, right? Hughes is a strong candidate
for the Yankees #5 starter role after a solid minor league career and a
good season in relief for the Bombers last year. And yet, what is a reasonable high-end
estimate for Hughes’s IP this season? Fifth starters never throw 200 IP, and in
any case, I doubt that the Yankees would work Hughes that hard even if they
need to move him up in the rotation. Also note that he has never threw more
than 160 innings as a professional. A reasonable high-end projection for him would probably be
140-160 IP, if everything goes well. And what if it does not?
It’s hard to estimate a low-end case for Phil Hughes, but the
lower bound for any pitcher is 0 IP. Hughes does not have a bad injury
history, but all pitchers get hurt sometimes. Also, he might struggle and lose
his rotation spot (assuming he is a starting pitcher this year).
I’m not saying that 120 IP is a bad estimate for Phil Hughes, but
it’s a more optimistic projection that is might seem at first.
Some of you may disagree, and frankly I’m not sure that this
particular projection is a bad one. What I am saying is that, collectively, IP
projections for young pitchers are higher than is consistent with what really
happens in the majors.
Consider CHONE IP predictions for all Yankees pitchers.
Here are the starters, not including the top four (Sabathia, Vazquez, Pettitte
& Burnett):
Projected
IP:
|
|
Phil Hughes
|
116.0
|
Chad Gaudin
|
162.0
|
Dustin Moseley
|
108.0
|
Sergio Mitre
|
101.0
|
Does anyone actually think that these four pitcher will
collectively pitch 480 innings? There is no chance!
This problem is not an isolated case for the Yankees. Pretty
much every team’s pitchers’ IP are over-estimated by CHONE, even if you count
only pitchers who are sure to be on the Major League roster. The Orioles are
projected to have seven pitchers with at least 18 starts and 100 IP each. Even if
the season had enough games for 200 starts, there is not much chance that these
exact seven pitchers will get all of
those starts.
I think that this is the key to understanding why
projections for young pitchers are historically too high (I will show just how
high they are later). If you look at individual projections for young
pitchers, they may seem reasonable, but collectively, they just don’t add up. This
is because we really don’t know which young pitcher is going to get playing
time, so we should severely discount the projections for all pitchers who are not
established major leaguers. The IP projections for all pitchers on a team’s
Major League roster should add up to significantly less than the total innings
for that season. Instead, all the projection systems that I have looked at tend
to have those stats add up to more than the total innings for that season.
Phil Hughes is a good pitcher, and will not get demoted
quickly. I think projecting 120 IP for him this year is a bit high, but it’s
not a unreasonable 50th percentile projection. However, CHONE also
projects 56 IP for Zack Segovia. Zack who? Exactly. Might he pitch 56 innings for the Nationals this year?
Maybe, but that is not a reasonable 50th percentile projection, at
least not for an impartial computer system.
I don’t mean to pick on CHONE. PECOTA has the same flaw.
However PECOTA data is not public. I’ve heard that OLIVER has good projections,
but I don’t think that those are available publicly either.
I don’t know how CHONE or PECOTA are trained, but I don’t
think they take proper account of how innings are distributed between veterans
and rookies. Over the past five years (2005-2009), here is how innings
have been divvied up:
- 640 pitchers per season
- 226 pitchers qualify as rookies (less than 50 IP previously)
- 67.7 innings per pitcher
- 37.6 innings per rookie pitcher
- 66% of innings are thrown by starters
- 20% of innings are thrown by rookies
The short version is that, in today’s game, most innings
go to experienced pitchers, especially in the starting rotation. Occasionally, there are pitchers like Tim Lincecum, Rich Porcello and Dontrelle Willis who rack up
a large IP tallies as rookies. It’s nice if your projection system
finds those guys. But in the meantime, that system will probably over-estimate playing time for many other good pitchers, who will struggle in
the majors, or who will see their rotation spot or roster go to a veteran.
Both PECOTA and CHONE are very good projection systems, and
they both take minor league stats into account. But I would argue that they
would be better off with unsophisticated, lowball IP estimates for all young pitchers, until those pitchers establish major league records worth projecting off of.
In my next post, I will demonstrate such a system.
No comments:
Post a Comment