My writings about baseball, with a strong statistical & machine learning slant.

Monday, April 5, 2010

Team Pitching Projections (II: Totals & Averages)

I was going to break down my projections for several interesting teams, but first I wanted to look at some aggregate stats for the various projections (mine, CHONE & PECOTA). Next thing you know, I've gotten a dozen team projections, and more averages that I can fit in a spreadsheet. I thought the results were interesting, so I'll present them here.

Here are the VORP projections for the dozen teams, from my system, PECOTA and CHONE (along with last year's values and MLB rankings):

Team
2009 VORP
My System
PECOTA
CHONE
Dodgers
265.3
(1st MLB)
181.2
224.5
168.9
Giants
264.4
(3rd MLB)
189.7
241.2
134.0
White Sox
242.6
(4th MLB)
195.0
202.7
216.1
Red Sox
237.7
(5th MLB)
189.3
259.8
264.2
Yankees
209.4
(12th MLB)
217.6
193.7
258.6
Rays
194.3
(14th MLB)
189.3
241.6
204.1
Rockies
177.0
(17th MLB)
172.6
217.0
133.4
Phillies
172.1
(18th MLB)
185.3
253.1
147.9
Royals
120.2
(20th MLB)
167.9
191.7
208.5

Indians
83.6
(26th MLB)
121.4
163.9
159.6
Brewers
34.6
(29th MLB
155.1
162.8
72.3

Nats
-10.0
(30th MLB)
112.9
212.0
81.3
--------
------
-----
----
----
Average
165.9
(15th MLB)
173.1
213.7
170.7
STDEV
90.6
30.3
32.2
61.9

I chose a rather extreme set of 12 teams (most of the best & worst pitching teams of 2009), so it's not surprising to see those teams' pitching VORP variance to fall precipitously.

What is surprising, however, are the PECOTA projections. Not only is the average way too high, but PECOTA projects the Nationals' pitchers to be 10 runs better than the White Sox, and 20 runs better than the Yankees. I don't think that anyone at BP believes this. They have issued several updates to PECOTA recently. Although the data I'm using here is very recent, maybe it still has some flaws. Whatever it is, summing up the PECOTA VORP projections for individual pitchers doesn't work. I'm going to ignore the PECOTA forecasts.

My system and CHONE seem to give more reasonable results. The averages look fine. While the two systems disagree strongly on some teams, they generally agree that the Yankees, Red Sox and White Sox should have the best staffs, while the Brewers and Nationals should have the worst staffs out of these dozen teams.

However, CHONE's projections have twice the variance that mine do. A few disagreements aside, my projections resemble a regressed version of CHONE's. Is that really what's happening? Am I regressing too far toward the mean?

As I wrote in the intro piece, I assign "missing innings" for each team to an unknown pitcher with a 5.0 league-neutral and park-neutral ERA. This is the ultimate reversion to the mean (claiming each team's available minor league talent is the same). To break this down ever further, I break up each team's VORP into three components:
  1. VORP for the top 5 pitchers
  2. VORP for all other pitcher projected (ie "secondary pitchers")
  3. VORP for "missing innings"
To get the top five pitcher, I rank all pitchers by expected VORP, based on the average of the three systems. So CHONE's top 5 and my top 5 are looking at the same pitchers in all cases. Usually, picking the top 5 pitchers is easy. It's going to be a team's best 3-4 starters, and its best 1-2 relievers.

Here are the breakdowns for team VORP (from my system) by the three categories:

Team
Top 5
(my VORP)
Secondary
(my VORP)
Missing Innings
Dodgers
106.4
30.5
44.3
Giants
131.0
28.1
30.6
White Sox
115.9
49.8
29.3
Red Sox
143.9
30.5
14.9
Yankees
145.8
48.5
23.4
Rays
102.6
66.6
20.1
Rockies
99.7
37.8
35.1
Phillies
130.1
34.4
20.8
Royals
95.4
31.5
40.9
Indians
44.7
32.9
43.8
Brewers
81.6
50.1
23.4
Nats
44.9
36.4
31.6
--------
------
------
-----
Average
103.5
39.8
29.9
STDEV
33.7
11.6
9.7

There is high variance among the teams' top pitchers, but teams' secondary and fringe players are projected to provide about the same total value for all teams. The Rays have a very deep pitching staff, with six legitimate starters and a bullpen with good pitchers for every role. According to my projections, their "secondary pitchers" (ie Rafael Soriano, Andy Sonnanstine, JP Howell, Grant Balfour and Dan Wheeler) are worth about 30-40 runs (3-4 wins) more than the weaker "secondary pitchers" on the Giants, Indians, Dodgers and Royals. That seems plausible.

Let's compare this to what CHONE projects:

Team
Top 5
(CHONE)
Secondary
(CHONE)
Missing Innings
Dodgers
101.2
43.0
24.7
Giants
98.4
24.5
11.1
White Sox
136.0
47.6
32.5
Red Sox
172.0
91.1
1.1
Yankees
163.0
80.2
15.5
Rays
111.0
83.4
9.7
Rockies
74.7
51.3
7.4
Phillies
123.1
24.8
0.0
Royals
128.5
54.8
25.2
Indians
68.8
90.8
9.0
Brewers
58.3
13.9
0.1
Nats
51.4
29.9
0.0
--------
------
------
-----
Average
107.2
52.9
11.4
STDEV
39.3
27.6
11.1

The "top 5" projections are on the same scale as with my system, and with about the same variance. The "missing innings" projections are lower, but also with the same variance. However, the "secondary pitchers" projections are very different.

The variance in the "secondary pitcher" projections from CHONE are huge. The chart is telling me that:

  1. The Indians' "secondary pitchers" are 8 runs better than the Rays'
  2. The Phillies' "secondary pitchers" are almost 70 runs worse than the Red Sox'
I don't believer either of those projection, nor much else in what I see here. The Indians' staff is very young. I think they'd gladly sign a binding contract for 60% of the Rays' bullpen's value right now, never mind for 110%.

However the issue here is not that rankings for secondary pitcher values are off, but rather that these rankings are what's driving much of the disagreement between my system and CHONE's system. CHONE's team-wide projections have a 60IP standard deviation. Almost half of that is due to variation in the secondary pitchers. I don't think that makes sense.

Secondary pitchers do often end up providing a lot of value to a team, but we can almost never expect that before the fact. By definition, the secondary pitchers can only be the 4th-7th best starters or the 2nd-7th best relievers on a team going into the season. Unless the player is under team control (with minor league option years left), a team can not stick really good pitchers into such roles, or to hide them in the minor leagues. A top starting pitcher can project for 25-50 VORP, and a top reliever can project for 15-25 VORP. However, it's almost impossible for a team to get this kind of projected value from their 6th-15th best pitchers. I might be projecting the Nationals and Indians generously by giving them credit for filling their missing innings with 5.0 ERA pitchers. Perhaps they won't be able to to that. However, I can't give the Rays more credit that I am already giving them for having a deep staff.

Therefore, although I probably am regressing my projections to closely to the mean, I do not think that CHONE's approach to projecting non-top players is better.

Has MLB become the NFL?

If you look back at my team projections, you will notice that the best team (the Yankees at 217.6 VORP) are only projected at 100 VORP higher than the worst team (the Nationals at 112.9 VORP). That almost seems to imply that, if the stars align, the Nationals might have a better staff in 2010 than the Yankees. That will almost certainly not happen. If the Yankees staff suffers a major injury, they will likely trade for a replacement, or find another solution. They will probably add a pitcher at the trade deadline. It's hard to imagine the Nats doing so, and easier to imagine them shedding talent at the deadline, possible to those same Yankees.

Still, the separation between the best teams (Yankees, White Sox, Red Sox and Rays) seems small, compared to the worst teams (Nats, Indians and Brewers). These groups of teams are less than 70 projected pitching VORP apart. Are the best and worst pitching teams really only 7-8 wins better than each other? Has MLB become the NFL, where parity ensures that advanced preseason projections are worthless?

Not necessarily. Seven wins is still significant, and my system does not simply regress last year's projections. Everyone expects the Nats and Brewers to improve, but my system also projects the Yankees to improve significantly. The Giants and Royals are intriguing teams, mainly because both are so top-heavy in terms of pitching value. Both me and CHONE project the Royals to improve while the Giants regress. However CHONE's projections are much more aggressive, projecting the Royals to have one of the best staffs in baseball, while the Giants become a below-average staff. Despite the high uncertainty in these projections, the differences are not small.

Still, why aren't projected differences in pitcher value larger than 7-10 wins at the extreme?

As I argued earlier, it's very difficult for a team (even the Yankees) to get great pitchers past their top five. However, no team's top five pitchers are projected to pitch more than 800 IP in 2010. Divide 800 by 5. That's 160 IP. In today's MLB, with injuries and pitcher usage being what they are, one's best pitchers do not project for more than 160 IP on average for a given season. This means that 45%-60% of a team's innings are projected to go to the non top-five guys. Good teams do not stand to gain much marginal value on those innings, and yet they make up half the expected IP total.

In earlier times, good starting pitchers might be reasonably projected to throw 200+IP, and top relievers threw over 100IP every season. Careers were shorter, so the three-year projections might have been the same, but the one-year projections for pitchers surely were higher. Now a team's top five pitchers could be expected to throw as much as 1000 IP from a teams' expected 1450. So in a sense, MLB is becoming more like the NFL. At least in respect to pitching value. Since hitter projections are less variable than pitcher projections, and most good hitters still play every day, I doubt that this argument translates for projecting hitter value.

No comments:

Post a Comment