My writings about baseball, with a strong statistical & machine learning slant.

Saturday, April 10, 2010

FIP & ERA baselines from projected IP (an alternative take on replacement level)

In my last article, I complained that my FIP/ERA projection system tends to regress all pitchers to the same baseline (around 4.6 FIP/ERA). This is an appropriate MLB average, but fringe pitchers (especially starters) should probably be regressed to a much lower baseline. So here, I show what such a baseline might look like. Incidentally, this can also be used to compute the "replacement level" FIP/ERA for starters and relievers.

The question I set out to answer was: given an IP projection (split by starter IP and reliever IP), what FIP/ERA should I expect from a pitcher? If I map actual IP to ERA, then I get a very nice graph with the properties that one would expect. But this graph is biased by the survivor effect. Better pitchers throw more innings, even if they start out with lower expectations.

Instead, what if we graph expected IP to actual FIP/ERA? Now we can answer a questions like: "what FIP/ERA should a team expect from a fringe starter (projected for 30.0 IP as a starter, or about five starts)?" My IP projection system is trained on all apitcher seasons from 2005-2009, including low-IP, high-IP and 0 IP seasons, so it projects realistic IP for all pitchers, not just the good ones. Also it gives separate estimates for starter IP and reliever IP.

Using actual performance for all pitcher seasons, I separated the pitchers into two groups:
  1. IP >= 1 and (starter IP) >= 40% * (total IP)
  2. IP >= 1 and (starter IP) < 40% * (total IP)
This is my categorization into "mostly starters" and "mostly relievers." The cutoff might seem arbitrary, but it separates the starter and relievers quite well. I could have left out a batch of pitchers around 50%, but I don't like excluding examples from my training sets, and there are not many such pitchers in any case.

Now I rank "mostly starters" by projected starter IP, and I rank "most relievers" by projected reliever IP. Within each group of 320+ pitcher seasons, I find the median FIP and raw ERA. Thus I create the series that are mapped below:

If some of that is confusing, let me explain that again with an example. Take the "mostly starter" IP series. The highest IP datapoint occurs at IP = 165.4. That is the median projected IP_Start for the top 320 pitchers seasons, ranked by projected IP_Start, provided that those pitchers threw at least 1 IP, and that 40% of their IP came as starters. These pitcher seasons include:
  • Johan Santana (2009), projected to throw 214 IP (threw 166.2 IP)
  • Brandon Webb (2009), projected to throw 206 IP (althrough he only threw 4.0 IP)
  • Does not include Ben Sheets (2009), since he threw 0 IP.
I hope that makes things more clear.

Within a group of 320+ pitcher seasons (I use larger samples at the lower IP data points), I computed the median FIP and ERA, regardless of the IP for each instance. So in the example above, Johan Santana's ERA based on 166.2 IP in 2009 would be used on the same scale as Brandon Webb's ERA based on 4.0 IP in 2009. I purposely don't weight the instances by IP, since that would introduce survivor bias. Without biasing myself toward how many innings the pitchers actually ended up throwing, I want to know: given a projection of "X IP_Start" and "Y IP_Relief", what is a baseline for that pitcher's ERA and FIP.

Replacement Level

Incidentally, my graph also suggests possible replacement levels for starters and relievers. If we view replacement level as the level of performance that can be easily acquired from the waiver wire or from the minor leagues, then the low-end FIP/ERA projections from the graph should offer some guidance.

For relief pitching, the median FIP for low-end projections is around 4.5 (ERA 4.6-4.7). For starting pitching, the median FIP on the low-end is around 4.8 FIP, but the median ERA is around 5.3.

The low-end starter group might look like an outlier, but the median FIP/ERA are based on 400 pitcher seasons with the lowest IP_Start projections, but for those who actually pitched mostly as starters. This group had an average actual IP of 58.9 (52.0 IP as starters). The median actual IP was 42.7 (34.5 as starters). Therefore the group is a good representative of pitchers who one would not have expected to start many innings, but were pressed into starter roles and typically started multiple games. I believe they represent a good estimate of the kind of production a team might get from a spot starter pulled from the bullpen, or from a starter pulled up from AAA.

Going forward, I will use assume the following FIP and ERA (league-neural and park-neutral) replacement levels to fill a team's "missing innings" in projecting overall team ERA and overall pitcher VORP:

raw ERA

This is not the only way to estimate replacement level for pitchers, but these are the values most consistent with my individual projections. If one were to use a different system to project IP, then one would get different results. However I don't know of another system that accurately projects IP_Start and IP_Relief for low-end pitchers. Compared to my system, PECOTA and CHONE massively over-estimate the IP for low-end pitchers, especially rookies.

FIP vs ERA disparity

Since FIP is meant to predict ERA (after removing the differences due to defense and BABIP luck), it may seem strange that replacement starter ERA is 0.4 runs higher than replacement starter FIP. However students of DIPS will know that FIP tends to under-estimate ERA for bad pitchers, and over-estimate ERA for good pitchers.

My graph seems to suggest that FIP trails ERA nicely in the range (4.1, 4.7), but the relationship starts to break down beyond that range. This is (in part) because FIP assumes that:
  • pitcher skills are limited to strikeout rate, walk rate and home run rate
  • these skills are linearly related to ERA
Both of these relationships break down on the high end and the low end of pitcher performance. Elite pitchers tend to have lower BABIPs than do average pitchers (although luck and defense constitute most of the BABIP difference for individual cases). Also elite pitchers tend to be better than average at secondary skills like holding runners, situational pitching, and fielding their position. Conversely, low-end pitchers are worse than average at all of these skills. Also, since outs have a non-linear relationship with runs (the more outs a pitcher produces, the less valuable each extra out is), pitchers who get very few easy outs (strikeouts, popups or soft ground balls) tend to have an even higher ERAs than can be linearly approximated from the factors of FIP. Think of Adam Eaton of 2007-2009. His FIP and xFIP were bad, but his ERA was consistently even worse.

Effects on Team Pitching Projections

Armed with new replacement levels for starters and relievers, I should have better team pitching projections soon. Since there is a large separation between replacement level for starter ERA and reliever ERA, teams will suffer disproportionately depending on whether their "missing innings" (ie those innings not filled by IP projections for pitchers on their opening day roster) will need to be starter of reliever innings. The Nationals, with holes in their rotation, will have to fill those missing innings at a higher ERA than the Royals, who have a set rotation, but will need to fill some of their bullpen at replacement level.

Teams will get no credit for relievers projected to post an ERA above 4.9, but will get credit for any starters with projected ERA below 5.3 (before league and park adjustments). This will make my projections much more accurate, even if they are now being made a little too late to count as pre-season predictions.

Davis, Buehrle, Feliz and Mariano Rivera

Also the baselines help me to resolve a couple of specific problems I noticed for individual pitchers. I projected Wade Davis at a lower FIP and ERA than Mark Buehrle. Davis pitched well in 36 IP as a rookie in 2009, and his ERA, FIP and xFIP were all better than Buehrle. However there is no way that one should project him to be better than Mark Buehrle in 2010. The baseline FIP/ERA for starters by projected IP allow me to fix this problem. In the new FIP and ERA projections, I am regressing pitchers to their individual baselines, rather than to the MLB baseline of 4.6. This will help Mark Buehrle.

IP Start (projected)
IP Relief (projected)
FIP baseline
ERA baseline
Wade Davis
Mark Buehrle

Similarly, I projected Neftali Feliz to post a lower FIP & ERA than Mariano Rivera in 2010. This is even more unreasonable, and new baselines should fix this:

IP Start (projected)
IP Relief (projected)
FIP baseline
ERA baseline
Neftali Feliz
Mariano Rivera

Once I iron out a few more kinks, I should have new FIP, ERA and VORP projections for both individuals and teams. I have not yet done much with park adjustments, other than to adjust the individual and "missing innings" ERA projections to the team's park factor from 2009. It would be nice to consider a pitcher's park factor in terms of specific effects on HR rate, but everything that I've read on this issue seems to suggest that park HR factors vary too much year to year to be of much use. With so many teams having changed stadiums in the past few years (or having changed major characteristics of the field, wind patterns or the ball itself), long-term park factors do not seem very useful for predicting future park factors. I'd rather use a cruder park factor that is more current.

No comments:

Post a Comment