Based on the simple ML system that I wrote about in my
previous post, here are my 2010 innings pitched projections. I offer
predictions for IP (broken down by starter innings and reliever innings) for
all pitchers active in 2009, as well as for those who missed 2009, but had over
120 IP in 2008 (ie Ben Sheets, Mike Mussina, etc). The pitchers are ordered by
2009 IP.
I also offer my predictions from an earlier system that does
not account for pitchers’ value stats (wins, VORP, saves, etc). I include the
difference between value-drive and non value-driven projections in the last
row. These differences range between +30 IP to -30 IP. The IP_Start model rewards both 2009
value and previous value, while the IP_Relief model is more heavily weighted
toward 2009 value. Roy Halladay and Johan Santana gain the most among
high-impact starters, while Mariano Rivera and Joe Nathan gain most among
high-impact relievers. More on that later.
It would be in poor form for me to include PECOTA or CHONE
predictions next to mine, so let me share a few averages with you instead. In
this context, rookies are pitchers who had at least 1/3 IP in 2009, but less
than 50 IP in their careers. Projections for rookies with no MLB experience are
not included. My system does not use minor league stats or scouting reports, so
all such rookies will project to about 20-30 IP, depending on age.
|
Actual
2009
|
My system (2010)
|
PECOTA
(2010)
|
CHONE
(2010)
|
Mean IP
|
65.1 IP
|
58.2 IP
|
99.9 IP
|
91.2 IP
|
Mean IP
non-rookies
|
78.2 IP
|
66.4 IP
|
103.6 IP
|
95.1 IP
|
Mean IP
rookies
|
14.8 IP
|
26.4 IP
|
86.0 IP
|
76.4 IP
|
My system expects an average pitcher to regress by 7 IP from
2009. Since total innings must add up, the roughly 500 * 7 = 3,500 missing IP
will be made up for by (yet unknown) pitchers without any major league
experience. That’s about 100 IP per team.
In my system, veteran pitchers will regress by about 12 IP
from 2009 on average (including pitchers who retire; more on that later).
Rookies who’ve had a cup of coffee in the majors will expect to pitch almost 12 IP more in 2010 than in 2009.
All of this is pretty consistent with the averages that I outlined in the
previous article.
However CHONE and PECOTA predictions are not consistent with
these averages. In either system, the average pitcher will increase his innings
pitched from 2009 by over 30 IP.
The average rookie will increase his innings pitched by over 60 IP! The
average major league rookie is projected to throw more innings than Mariano
Rivera.
I don’t think that any of this is realistic. CHONE and
PECOTA allocate over 100% of 2010 innings to veterans, then allocate a further
20% of 2010 innings to rookies, and none of this accounts for pitchers making
their major league debuts this year (while my system leaves 100 IP per team for
those pitchers).
Some may argue that PECOTA and CHONE make “if he makes the
majors” projections. A lot of these innings will end up getting pitched in the
minors, and major league innings can be adjusted on a team-by-team basis. I
think BP has a manual process where pitchers are selected based on likely
playing time, and PECOTA IP projections are adjusted accordingly.
However I’m not sure that such a process is necessary, nor
do I think it’s optimal for projecting team pitching totals. As I showed in my
last posts, there are pitchers every year that come out of nowhere to pitch
significant innings. Also the top prospects, as a group, collectively pitch
fewer innings than CHONE or PECOTA would lead you to believe. Lastly, there was
a nice article on BP recently (by Tommy Bennett, behind the pay wall) showing that a team’s
5th and 6th starters pitch comparatively similar innings
on most teams. Therefore picking which players will be in the rotation, or who
will be on the 25-man roster, is both futile and also counterproductive for
projecting individual IP.
It would be better to compute an independent set of IP
projections that respect the recent averages, and then (possibly) make some
small team-based adjustments by hand or automatically. It’s more realistic to
say that the Yankees will use 20 pitchers in 2010 whose collective IP is
slightly less than the Yankees’ overall 2010 IP, rather than listing their top
10 or 13 pitchers, whose collective IP adds up to or exceeds the Yankees’
overall IP. (FWIW, the
Yankees used 24 pitchers in 2009.) Now using projected ERA or another
value rate stat and a realistic IP projections, it should be possible to
approximate the Yankees’ overall projected pitching value in 2010 in a robust
manner.
Hopefully I’ll have time to complete all (most?) of those
steps before the season gets under way.
Although my system is much better than CHONE or PECOTA at
generating IP estimates for rookies, and somewhat better at projecting
veterans, it is still far from ideal. Most notably, I think my (non)handling of
retired pitchers hurts my system’s ability to project veteran starters’ IP
rationally.
I think it’s important to train a system with examples where
veterans don’t come back strong after
a bad or injury-plagued year, and also to include examples of sudden falls to 0
IP after decent seasons. Both these things do happen to pitchers and the model
need to take that into account. However, I think my system is overdoing the
downward regression bit for starting pitchers. The only pitcher who threw 130
IP in 2009 who is expected to increase his total in 2010 is Johan Santana (who
is expected to move from 166.7 IP to 174.0 IP). A typical starter who threw 180 IP in 2009 with solid
results (Roy Oswalt or Joe Saunders) is expected to throw 30 IP less in 2010.
This sounds a bit harsh to me.
The cases of voluntary retirement for guys throwing over 100
IP in the previous season are few, but they might be screwing the results for
all starting pitchers in my model. Greg Maddux and Mike Mussina were projected
for 140 IP and 180 IP respectively in 2009 by my system. Both retired and threw
0 IP. In a model designed to minimized root mean square error, a few such
examples can make a significant impact. Both Mussina and Maddux retired well
before the 2009 regular season. My model should have been privy to that
information, and ascribed their severe downturn in IP to retirement, and not so
much to other factors.
Ben Sheets and Jeff Francis were also projected to throw a
bunch of innings in 2009, but missed all of last year due to injury. Their
cases should absolutely be in
included in the model. However before I can build a decent injury-based model
for IP, I need to account properly for known preseason retirement. Now where
can I get that damn retirement data…
No comments:
Post a Comment