Wednesday, March 24, 2010

Yankees' 25-man pitchers (everyone loves projections)

When I worked at Big Software Company X, I always told new engineers to include specific analysis with any data that they sent for others to look at. So here, I am following my own advice. I can't publish PECOTA and CHONE projections with mine side-by-side for all pitchers, since that gives away those guys' hard work for free, but I think it's OK to list a few projections, for the sake of comparison. Better yet, let's look at a complete team.

Here are the IP projections from my system, along with PECOTA and CHONE, for the New York Yankees' likely opening day 12-man pitching roster. For those that don't like spreadsheets, here are the pitchers I think will be on that 25-man roster:

  • Starters: CC Sabathia, AJ Burrnett, Javier Vazquez, and Andy Pettitte
  • Starters/Relievers: Joba Chamberlain, Chad Gaudin, Phil Hughes and Alfredo Aceves
  • Relievers: Mariano Rivera, Chan Ho Park, Damaso Marte and David Robertson
You could argue for Sergio Mitre, Mark Malencon and others, but this is probably what the Yankees will start with. FWIW, here is their official depth chart.

None of the projected top 12 pitcher are rookies, so PECOTA and CHONE are not subject to the wildly optimistic rookie projection issue in this case. You can read more about rookie projections in my previous post. Even so, the projections from PECOTA are much to high on average (less so for CHONE).

In 2009, the 12 pitchers listed threw a total of 1532.2 innings. The Yankees' season innings is roughly 1450. Therefore, it is unlikely that the pitchers above will throw more than 1300 or so innings in 2010 (remember, at least some innings will be thrown by rookies and major league veterans currently in the minors). Indeed, my projection system gives the 12 guys 1236.8 innings this year. That may be low or that may be high, but it's a reasonable guess. However, PECOTA projects these same pitchers at 1563.6 total innings. That's 100 IP more than the Yankees will have in 2010. CHONE projects them at 1388 IP, which is still high, but is much less so than PECOTA, and can be mitigated by the fact that Javier Vazquez and Chan Ho Park got their innings for other teams last year. None of the projection systems take account of team innings balance.

Then again, as I wrote previously, my system does not require much of this team-based balancing, since it gives much more realistic projections for rookie and low-end pitchers (at least on average). With reasonable roster construction, total team projections should come out reasonably without adjustment. My system will over-estimate a teams' pitchers' innings if that team signs seven top-line starting pitchers, but teams never do that. My system could, however, massively under-estimate innings pitched for teams that have very few pitchers with major league success & experience. However, one could just proportionately increase that teams' rookies' innings pitched for a more reasonable adjustment. In other words, we would estimate rookies' IP not from their own minor league stats, but from a teams' need to fill innings and roster spots.

In the case of PECOTA (and CHONE, to a lesser degree), they massively over-estimate a teams' established pitchers' IP. This calls for additional (and possibly skewed) adjustments in order to get numbers that add up. Even though the individual projections look reasonable, the team totals are not reasonable. To get totals that sum to 1300 IP, we'd need to reduce PECOTA's projections by 10-15%. It's not clear whether all, or just the low-end pitchers, need to have their projections lowered.

Projecting IP matters for estimating just about any other pitching stat (for fantasy or otherwise). If you want to project strikeouts, walks, wins, saves or IP-weighted ERA (or WHIP), it is important to know what is the likely playing time that a pitcher will be able to handle. My projection system does not give a high and low end projection (ie 75th and 25th percentile), but it does give single projections that add up on a league-average basis.

