My writings about baseball, with a strong statistical & machine learning slant.

Tuesday, November 16, 2010

Why R makes more sense than R^2 (in looking at correlations in baseball)

I've been so far removed from baseball that I have not read Tom Tango's "The Book" blog in some time. Thankfully I recently ran into one of his posts anyway.

Here, Tom gives a succinct qualitative explanation of the difference between r and r^2 (r squared) in studies of statistical correlation. If half the data points are a direct match for a simple linear rule (e.g. equality) and the other half of the data points are random (within the same range), then your r is 1/2 and your r^2 is 1/4. It is more natural to say that the correlation is 1/2. That makes sense. The 1/4 deals with variance, which is not an intuitive concept. Tom's full article is here:

This makes me feel a whole lot clearer about the results of my fastball speed vs pitcher strikeout rates studies from last year. I looked at a variety of non-performance characteristics for pitchers, and how those helped to explain the pitchers' strikeout rates. Non-performance means that I looked at physical characteristics like age, height, weight, and (left) handedness. I also looked at which pitches they threw, how often, and how hard. By far the most relevant characteristics were fastball speed and handedness. I got about a 0.6 correlation (ie r) between those two factors and strikeout rates. That looked pretty important. However 0.3 r^2 sounded less significant.

What that means that is that fastball speed and handedness explain "only" 36% of the inter-pitcher variance in strikeout rates, but they explain about 60% of the difference in terms of standard deviation. Or at least that's how I will think of it from now on. What I found was significant, and I believe interesting. It has not gotten much run in the nerd baseball press, but perhaps this is because I did not promote it. That won't change. But if you are interested, look for some of my articles from the winter of 2009-2010. I have not seen anyone write about this before or since. Although I haven't checked recently.

For what it's worth, it seems like teams are valuing (left) handedness and fastball speed more highly in prospects than ever before. This is not surprising; sports always tends to evolve that way. As money increases and the talent pool grows, rare ability and "natural" talents takes precedence over common ability and "refined" talents and experience. Just consider players like Amar'e Stoudemire in the NBA. That is why I thought that Aroldis Chapman was a bargain for the Reds, purely based on his fastball speed and handedness.

At some point, I should go back and see how the rest of my predictions fared, including my 2010 IP and ERA projections for all MLB pitchers. But that would require more effort than writing a short article like this one.

Friday, September 10, 2010

Tim Lincecum: home runs, strikeouts and fastball speed

Whatever happened to Tim Lincecum? His 3.69 ERA over 185 IP is pretty good and his FIP of 3.33 is even better, but both are a well below his 2008-2009 level, when he deservedly won two NL Cy Young awards. His strikeout rates are down. His walks are up. He is also allowing more home runs.

By far the biggest difference between 2008-2009 and 2010 for Lincecum have been his home runs given up. He gave up home runs on 6% of fly balls in 2008-2009, but 10% of fly balls in 2010. Tim's xFIP (FIP with league-average for HR per fly ball) is only slightly up from 2008-2009.

So if someone asked you: "what's wrong with Tim Lincecum in 2010?" you would tell him that his luck with fly balls going out has turned for the worse. Then you would debate how much a pitcher controls his HR/fly ball rates. But other than home runs given up, has anything else changed about Timmy the Freak?

Eric Seidman of Baseball Prospectus recently wrote an article on Lincecum, attempting to break down the differences in his repertoire between 2010 and 2008-2009. It's unclear what changed, other than the drop in fastball speed, and also a decrease in fastball movement. Seidman goes on to suggest that, if Lincecum's fastball is slower an "has less bite," then it may make the rest of his pitches less effective, even if they are the same pitches as before.

There are a lot of moving parts here, so let's just look at his declining fastball speed compared to his declining strikeout rates. Of all the changes, these are the most easily noticeable "cause" and "effect." Numbers from Lincecum's FanGraphs profile.

Year SO/9  average FB % fastballs
2007 9.2 94.2 69%
2008 10.5 94.1 66%
2009 10.4 92.4 56%
2010 9.7 91.3 55%

The trend is weak, but is looks like Tim Lincecum's strikeout rates are declining with his fastball slowing down. Lincecum improved as a strikeout pitcher from his 2007 rookie year, but is now probably maximizing his strikeout ability, relative to his physical skills. As the physical skills decline, so will his ability to strike batters out.

Why am I looking at fastball speed in predicting strikeout rates, and not various other pitcher skills or characteristics?

In a series of studies I did last winter I found that, of all the non-performance pitcher attributes, fastball speed was by far the most predictive in terms of predicting strikeout rates. I can predict a pitcher's strikeout rate (given a reasonable IP cutoff) with R=0.52, given just his average fastball speed.  The predictive power only goes up to R=0.59 if I also look at what other pitches he throws, his league, his age, his weight, and whether or not his is left handed. Of those factors, the league and handedness are by far the most helpful. See my old article here for more details.

The most important relationship that I discovered in my research was that 1. fastball speed predicts strikeout rates to a significant degree 2. the relationship is non-linear.

Here are fastball speeds vs strikeout rates 2002-2009 pitchers. Strikeout rates are fit by a quadratic function (ie square of the fastball speed). Note the discrepancies by league:



A fastball velocity drop from 95 mph to 90 mph in the NL (in blue) is worth a strikeout rate drop from 9.0 SO/9 to 6.5 SO/9. That's a big difference. For a starting pitcher, that's a drop from being an elite strikeout pitcher to being league average.

Tim Lincecum has dropped from a 94.1 mph pitcher to a 91.5 mph pitcher. According to the graph above, that should be worth a strikeout drop of about 8.5 - 7.0 = 1.5 SO/9. Even for someone who strikes out ten batters per nine innings, a 1.5 SO/9 drop is enough to taking him down from elite pitcher to just very good.

Tim Lincecum is not a typical pitcher. He over-achieved his fastball-based strikeout projection at 94 mph, and he's over-achieving his strikeout projection at 91.5 mph. But the strikeout drop is still there. If Lincecum's fastball declines further, he won't any longer be a ten strikeout per nine innings pitcher. To record sub-3.00 ERA's, he'll have to bring down his walk rates from around 3.0 BB/9 to more of a Cliff Lee type level, or he will have to get lucky with keeping fly balls in the park at well below the league average.

He is not so old that decreases in fastball speed are inevitable, but I'm guessing that Tim Lincecum is too young to increase his fastball speed, the way that Zach Greinke did 2005-2007. Nor does he have the strikeout rate advantage of being left-handed (worth about +1.5 SO/9 for the same fastball speed, according to my study).

There are many components that go into being a great pitcher. But strikeout rates and walk rates are the two factors that pitchers have the most control over. I'm not saying that Lincecum's 10.0 SO/9 rates from the last couple years were a fluke. But he will have trouble maintaining those into the future as his fastball speed has already declined. Look for him do decrease his walks. If he is not able to do so without further decreasing the strikeout rates, Tim Lincecum can't be considered the best pitcher in baseball going forward.

Tuesday, August 24, 2010

Incomplete thoughts on ground-ball pitchers

I've been out of the baseball loop, focusing on my basketball project. But I still slavishly read Bill James's site, and he brought up a point relevant to my research on pitcher types. In answering a question about Brandon Webb, Bill said:
I've said it a thousand times, but. . .I don't believe in ground ball pitchers. I don't trust them, I don't want them, and I don't believe one should ever invest money in them. In theory, a ground ball pitcher with a good strikeout rate is the best of both worlds. But the problem is, there just aren't any pitchers like that who are consistently good; they all either get hurt or they lose home plate. The only pitcher like that who has had a great career in the last 30 years was Kevin Brown. The overwhelming majority of the consistently good pitchers are the guys who live off of the high fastball--Clemens, Schilling, the Unit, Pedro, Santana, King Felix, Verlander, Sabathia, etc.
When I left off my baseball research, I left off with a classification of pitchers by the type of pitches that they throw. Dave Allen pointed out that I should look at pitchers who throw two-seam fastballs, as those pitchers have become the subject of much sabermetric discussion. Two-seam fastballs induce ground balls like no other pitch, and the value of ground balls for pitchers has become a hotly debated topic. (By hot, I mean that multiple analysts are competing to show how much value ground balls really have for pitchers.)

I created a new category of pitchers, centered around those that throw a high percentage of two-seam fastballs. Indeed, this category of pitchers had very high ground ball rates (something like 6% higher than average), but also lower strikeout rates (about 0.5 K/9 less than average). I was going to write an article about whether or not this "tradeoff" is worth it.

But Bill James brings up a better point. Who are the great two-seam fastball ground ball pitchers out there? Clearly Brandon Webb has to be the most famous example. But let's consider the others. I only had reliable two-seamed fastball data for 2009, so all examples have to be from last year. Here are the most name-recognizable pitchers who classified as "type 8: two-seam fastball pitcher" by my scheme. All data courtesy of PF/X posted on FanGraphs.

Pitcher 2009 FT% 2010 FT%
Joel Piniero 28% 49%
Brian Matusz 14% 21%
Scott Kazmir 10% 0%
Rick Porcello 22% 52%
Francisco Liriano 12% 46%
Fausto Carmona 9% 33%
Chien-Ming Wang 23% NA
Carlos Silva 43% 50%

Ignoring Scott Kazmir, who no longer throws two-seam fastballs, and the hobbled CM Wang, is there anything we can generalize about the two-seam fastball pitchers?

First of all, none of them are backing off the pitch. This is selective, since I chose the most recognizable proponents of the pitch, and PF/X pitch classifications are not consistent year to year. Still, I think this suggests that two-seam fastball pitchers are on the rise. How is it affecting their stats?

All of these pitchers are recording high GB% stats on the season, except for Brian Matusz. Joel Pineiro leads with 56%, and none of these guys except Matusz are below 45% (league average is in the low 40% range). Accounting for randomness, these pitchers are all getting high ground ball rates, in part due to their use of the two-seam fastball. However none of them except for Liriano are having top-level season. Here are the 2010 strikeout rates (K/9) for those pitchers:

Pitcher 2010 K/9
Joel Piniero 5.7
Brian Matusz 6.9
Rick Porcello 4.5
Francisco Liriano 9.8
Fausto Carmona 4.8
Carlos Silva 6.3

Not surprisingly, Francisco Liriano has a 3.45 ERA, despite a very unlucky 0.350 BABIP. He is having a "Kevin Brown" season, as Bill James would describe it, with both high GB% and high strikeout rate. However the other pitchers have league-average strikeout rates at best. Fausto Carmona has the stuff (93 mph average fastball) to be a high-strikeout pitcher, but he has never realized that potential (even during his 19-win season in 2007, he was a low-strikeout pitcher). It is very unlikely he will become even a league-average strikeout pitcher at this point in his career. Joel Pineiro was dominant earlier this season when he was getting 70% ground balls, but his ERA and FIP have settled around 4.0 now that his ground ball rate is a more sustainable 56%. Without above-average strikeout rates, a pitcher's long term ceiling might be that 4.0 ERA. Not bad, and worth a couple of WAR, but not in the elite pitcher echelon.

Brian Matusz in an interesting case. He was a high strikeout guy in the minors, and has had a league-average strikeout rate over his first 200 major league innings. He throws a two-seam fastball according to PF/X, but he doesn't get high ground ball rates. I'm not sure what's going on there. Maybe he just doesn't belong in this list.

Overall, though, I think Bill James's point is well taken. You can't be a great pitcher on ground balls alone, at least not over a course of several years. You need to have strikeouts. Francisco Liriano might be the next pitcher to maintain high GB rates with high strikeout totals. But he'll have to prove it over more than one season.

Projecting Liriano in April

I'm happy to see Liriano having a great season. He's endured a few injury setbacks, and I'm happy to see him finally come back to form. Also, my pitcher projections were very favorable for him, and it's always pleasant to be right on something like this.

In the projection I published in April, here is what my system predicted for Francisco Liriano in 2010:

  • 159.6 IP; 20.9 VORP; 4.31 ERA; 4.24 FIP

Of course, he has been much better than that. But it was bold of my system to project him for a full season, and to be in the top 50 most valuable pitchers in MLB. In 2009, he was 5-13 with a 5.80 ERA in 136 IP. I think this is a real win for my injury-based projection adjustments.

With the season finishing up, I will go my predictions more systematically. As you can see from the link above, I was off on quite a few of them. I was probably more pessimistic on Cliff Lee than most (in part due to his injury in camp). I got fooled on Javier Vazquez. And I thought John Lackey would be a workhorse, rather than a dud.

But this is all a topic for a future post. Til then...

Wednesday, July 28, 2010

Andre Dawson, Bert Blyleven, Johnny Damon and Miguel Cabrera

Jorge Posada is one of my favorite Yankees. I've always been a big Andy Pettitte fan, and generally considered myself one of those kind of Yankees fans. But this isn't about Andy Pettitte. Or even about Posada, for that matter.

Jorge Posada came up when he was 23, but he didn't become a regular until he was 26. Since then, he's been the top offensive catcher in the AL just about every year. And although I'm sure no one was talking about it in 2000, Jorge Posada compares favorably to the Hall of Fame's current battery of backstops. He hasn't been Johnny Bench or Mike Piazza or Garry Carter, but he was pretty damn good for the past decade and change, and he ain't done yet.

When I checked Jorge's profile on Baseball Reference, I wasn't surprised to see his closest comps to that of Carlton Fisk, but I was a little surprised that he only ranked 26th on the list of WAR (wins above replacement) among currently active players. I clicked ahead to see the top 50 active players by career WAR. Man it's a hell of a list. This got me thinking.
Last year Bill James wrote an article called "The Expansion Time Bomb." [Unfortunately it's behind the paywall on his site.] Bill argued that, as baseball has expanded since 1969, so too has the number of players reaching levels "historical achievement" that typically define a Hall of Fame career. In other words, in an expanded league, there will be more players with 3,000 hits, more players with 500 home runs hitters, more 300 game winners, and otherwise more milestones being reached. This seems intuitively true, but it is also very hard to argue, and harder to verify. Still, his main argument is an interesting one (which I paraphrase below):

Most supporters want the Hall of Fame to be an exclusive club. This inherently means restricting membership to a small number of entries per year (or decade, or other time period). As expansion has led to more players with historical levels of achievement, Hall of Fame standards will tighten to levels much more narrow than those used in the past.
We'll come back to this thought in a minute. First, let's look at the current top 50 in baseball by career WAR. How many of them are Hall of Fame players? (By the way, WAR is simply a measurement of career "wins added" above a replacement-level player. The merits of WAR are not important here. It is just a way to place all active players, regardless of position, on a rough universal career ranking.)

For each player, I rate his Hall of Fame chances as "yes," "probably," "maybe," or "no." I assume a conservative estimate for the rest of his career. In other words, will he make the Hall of Fame, based on today's standards, if he doesn't do much for the rest of his career? I'm ignoring steroids and just focusing on performance.

  1. Alex Rodriguez (34) - Yes
  2. Albert Pujols (30) - Yes
  3. Chipper Jones (38) - Yes
  4. Ken Griffey (40) - Yes
  5. Derek Jeter (36) - Yes
  6. Jim Edmonds (40) - Probably
  7. Jim Thome (39) - Probably
  8. Manny Ramirez (38) - Yes
  9. Ivan Rodriguez (38) - Yes
  10. Scott Rolen (35) - Maybe
  11. Andruw Jones (33) - No
  12. Vladimir Guerrero (35) - Yes
  13. Todd Helton (36) - Probably
  14. Bobby Abreu (36) - Maybe
  15. Carlos Beltran (33) - Probably
  16. Jason Giambi (39) - No
  17. Ichiro Suzuki (36) - Yes
  18. Mariano Rivera (40) - Yes
  19. Roy Halladay (33) - Probably
  20. Andy Pettitte (38) - Maybe
  21. Johnny Damon (36) - Maybe
  22. Mike Cameron (37) - No
  23. Jamie Moyer (47) - No
  24. J.D. Drew (34) - No
  25. Johan Santana (31) - Probably
  26. Jorge Posada (38) - Maybe
  27. Lance Berkman (34) - Maybe
  28. Tim Hudson (34) - No
  29. Omar Vizquel (43) - Probably
  30. Roy Oswalt (32) - Probably
  31. Mark Buehrle (31) - Maybe
  32. CC Sabathia (29) - Probably
  33. Adrian Beltre (31) - No
  34. Miguel Tejada (36) - No
  35. Javier Vazquez (34) - No
  36. Jason Kendall (36) - No
  37. Chase Utley (31) - Probably
  38. Magglio Ordonez (36) - No
  39. Joe Mauer (27) - Probably
  40. Eric Chavez (32) - No
  41. Mark Teixeira (30) - Maybe
  42. Troy Glaus (33) - No
  43. Barry Zito (32) - No
  44. Placido Polanco (34) - No
  45. Carlos Zambrano (29) - No
  46. Tim Wakefield (43) - No 
  47. Rafael Furcal (32) - No
  48. Edgar Renteria (33) - No
  49. David Wright (27) - Maybe
  50. Miguel Cabrera (27) - Probably
You will surely disagree with some of my assessments. But let's consider the overall picture. How many future Hall of Famers are active in 2010? I'll assign the following counts per rating:

  • Yes = 1.0
  • Probably = 0.7
  • Maybe = 0.3
  • No = 0.0
Ken Griffey counts as one Hall of Famer; Roy Oswalt counts as 0.7 Hall of Famers. My main man Jorge counts as 0.3 Hall of Famers. Between himself, Bobby Abreu, and David Wright, there will most likely be one Hall of Fame career. Remember, I'm being a little conservative here.

Out of the fifty players, we get twenty one Hall of Famers, breaking down as follows:

10 * Yes + 12 * Probably + 9 * Maybe + 19 * No = 21.1 Hall of Famers

A Hall of Fame career is typically 16 to 20 years, so in theory, this list represents 16 to 20 years' worth of Hall of Famers, assuming these are evenly distributed through time. However, the list does not include a single player under 27. Hanley Ramirez, Zack Greinke and Tim Lincecum are not accomplished enough yet to be considered possible Hall of Famers for this discussion. Therefore, let's say that the top fifty players by WAR includes all possible Hall of Famers over fifteen years (ie age 27 to age 42).

If my list is reasonably accurate, this suggests that we will induct twenty one players over every fifteen years, if the future performance is much like the recent past.

To me, that sounds very reasonable. With an average of 1.4 new qualified candidate per year, the Hall of Fame would be electing zero to three players every year. Yes, they will be electing more candidates per year than in the recent, but not by much. It will he harder for borderline candidates to get in, but there would never be backlogs of qualified recent candidates running ten deep. There will be years with no obvious Hall of Famers on the ballot, and in those years, weaker candidates will still have a chance to be elected.

While I still think that Bill James's argument sounds appealing, I just don't see the glut of Hall of Fame level performers driving up future Hall of Fame standards significantly. Instead, we will see more years with one or two good new candidates, and fewer multiyear stretches where the best candidate on the ballot is Phil Neikro or Jim Rice. But those borderline cases will still get plenty of consideration. When he is up for Cooperstown, Johnny Damon will have more competition on the ballot than did Andre Dawson and Bert Blyleven, but his career will be just as thoroughly vetted as those two's were.

According to Wikipedia, there are 203 former players currently in the Hall of Fame. These represent the achievements in Major League Baseball of the last 100 years, as well as a few achievements from the 19th century. That's somewhere between 1.7 to 2.0 players per year of Major League Baseball, depending on who's counting. By my count, we will have 1.4 players per year in the future, based on conservative projections of today's stars. Even accounting for the Veterans Committee's past indiscretions inflating the 2.0 number, I don't see a tightening of standards that will exclude Johnny Damon, Bobby Abreu, Jorge Posada, Lance Berkman or Todd Helton from being considered as legitimate candidates. According to my estimates, two of those five guys will get in, and I think that's about right.

Wednesday, June 23, 2010

Status Update!

I've been off the blog for a month. I wrote a couple of followup articles for my pitcher type series, but never got around to editing and publishing them (yes, I actually edit my work).

As per Dave Allen's suggestion, I created a category for two-seam fastball (sinking fastball) throwing pitchers. Also I did some analysis trying to figure out whether throwing lots of these pitches (ie qualifying for my new category over all the others) is an effective strategy. I had some game-theory ideas about giving up strikeouts for ground balls, etc. Then I just got busy with several other things. So what was going to be a week long delay turned into a month.

I've been in Vegas, playing a few WSOP events. Also, I've been working on a software project, and doing a lot of drawing.

Moreover, I've found that I'm less nuts about baseball that I was a few months ago. Baseball games are undeniably boring to watch in their entirety, and the season is too long for anyone to truly care about the result of any particular game. I still love baseball, but:
  1. None of my good friends do. Although they are all big sports fans.
  2. I never go out of my way to watch a game on TV.
  3. I don't get excited about seeing an MLB game in a new city when I am travelling.
The third point is hard to admit. I was in Chicago a month ago, and I had a free afternoon. I had never been to Wrigley, and the Cubbies were scheduled for a day game on a Friday. I'd have thought I'd jump on the chance to go to the game. But I didn't. I went to the Art Museum instead, and I didn't remotely regret it. If I can't get excited about going to a day game in Wrigley by myself, then I guess I'm not as big a baseball fan as I had thought. Or I'm just more interested in art at the moment.

Maybe I'll get back to baseball soon, maybe I won't. But for now, I would rather spend the afternoon drawing, than spend it writing and revising a baseball article. I'll probably go back and revisit my preseason pitcher projections around the All-Star break. But that would be more for vanity than from an impartial sense of interest.

In other news, I got an invitation to interview for the Diamondbacks statistical analyst job, but I turned it down (despite the fact that I respect their organization, and love the American Southwest). However this has more to do with my software projects than it has to do with my attitude toward baseball. It's a great job, and I hope the DBacks make a great hire. I'm sure they will!

Wednesday, May 5, 2010

Does Dave Duncan hate change-ups?

In the comments for my "eight types of pitchers" article, John noted:
Type 3 pitchers seem to be Cardinals even though they're just 4% of pitcher seasons they made up 12.5% of the Cardinals' 16-man staff last season. Also, while Type 2 pitchers make up just 18% of the MLB population, they made up 31.25% of St. Louis' staff last season. They seem to be doing that by avoiding Type 0, 4 and 7 pitchers. I wonder if this could be a personal preference by pitching coach Dave Duncan. Do you have data that suggests some MLB teams look for certain types of pitchers and/or convince pitchers to use a certain percentage of their stuff?
In other words, does Cardinals pitching coach Dave Duncan encourage his pitchers to become certain types of pitchers, and not other types? Duncan has been lauded on many blogs and baseball news sites over the past couple of years due to his staffs' repeated successes. He seems to have revitalized multiple pitching careers over the past few years, including Joel Piniero in 2009. Pitch F/X expert Dave Allen pointed out that Duncan's pitchers get more ground balls under his tutelage than they had before.

Is there a secret to Duncan's (perceived) success in reclamation pitchers? Does he turn pitchers into specific types that are more successful, on average, than other pitcher types?

John suggests above that Duncan's pitchers tend to be type 2 and type 3, but not types 0, 4, or 7, as compared to the league average last year. For those confused about the pitcher types, please read my article explaining the pitcher types. The types are derived from what I determine to be a pitcher's core and secondary pitches. All pitchers are assumed to throw the fastball as core pitch (I do not yet distinguish between two-seam and four-seam fastballs; coming soon, Dave). As a quick reference:

  • type 0: change-up core; slider secondary
  • type 1: cutter core
  • type 2: slider core; change-up secondary
  • type 3: slider and curve core
  • type 4: curve core; change-up secondary
  • type 5: change-up core; curve secondary
  • type 6: slider core; no secondary
  • type 7: splitter core; slider secondary
It turns out that John's observation is (mostly) correct.

I looked at 2005-2009, rather than just 2009. I counted all pitchers for each team that threw at least 20IP. This is plenty of playing time to establish a repertoire. Here is a list of all teams' pitcher types, by count of 20IP+ pitchers. First are the percentages, then the raw counts. I included averages and standard deviations for reference. The data is missing all pitchers who were traded midseason. Sorry.

Indeed, the Cardinals' pitchers are more likely to be type 3 (and also type 1) than an average team. The differences lie outside of one standard deviation from the norm. Likewise, the Cardinals' pitchers over one standard deviation below the norm for type 0, type 5, and type 6. The staff is within one standard deviation from the norm for type 2, type 4 and type 7.

Here is an excerpt of my full team type chart:

T_0%
T_1%
T_2%
T_3%
T_4%
T_5%
T_6%
Average
(1 STD)
7-19
2-10
12-24
1-8
8-20
4-13
39-26
Giants
10
6
17
6
11
7
41
Yankees
8
14
13
6
15
7
35
Cardinals
6
13
22
16
12
1
24

The sample size (71 pitcher seasons) is too small to conclude anything, but here are some possible explanations of what is happening:
  1. Dave Duncan hates change-ups! Type 0 and type 5 are primarily change-up pitchers. Lots of really good pitchers have been type 5 (Greg Maddux & Tom Glavine, for example). However, very few of Duncan's pitchers fit this profile.
  2. Dave Duncan doesn't care for young flame-throwers (or he reforms them quickly). Type 6 pitchers are the most common type of major league pitcher, by far. Many, if not most pitchers come up to the majors as hard-throwing type 6 guys, featuring a fastball, a slider, and not much else. There is a dearth of type 6 pitchers on Duncan's staff, although the number is not ridiculously low. They still make up 24% of his staffs (league average is 33%, and the Cubs form the high-watermark at 48%).
It actually doesn't take much analysis to see that Dave Duncan's pitchers throw fewer change-ups than any team in the baseball. FanGraphs has the aggregate pitch percentages by team year. Cardinals pitchers threw fewer change-ups that any other team in 2009, although they are somewhat higher in 2010 (but still solidly near the bottom).

This may just be confirmation bias, but it is entirely plausible that John is right, and Dave Duncan teaches his pitchers to throw curveballs, and not change-ups. As I mentioned in my previous piece, so far in 2010, Cardinals closer Ryan Franklin is throwing more curve-balls then even before, and would currently be classified clearly as a type 4 pitcher. Earlier in his career, Franklin used to be a slider/change-up kind of guy.

If Duncan tells his pitchers to throw curves as their off-speed offerings instead of change-ups, then that would explain why his staff has an unusually high number of type 3 pitchers. I assume that conventional wisdom would dictate that a pitcher should throw a slider or a curve. Maybe Duncan is teaching his pitchers to throw both a slider and a curve. If so, that would explain Cardinals' pitchers' improved ground ball rates.

Change-ups, when put in play, tend to result in fly balls (sorry I don't immediately know of a study proving it, but this makes logical sense). Thus it seems plausible that throwing fewer change-ups will result in fewer fly balls. Recent advances in DIPS (defense independent pitching statistics) seems to suggest that pitchers with high ground ball rates also give up fewer home runs per fly ball (as well as 0% home runs per ground ball).

In other words, in today's game, with short outfields walls and middle infielders who can hit one out, it may not make much sense to throw change-ups for any pitcher who does not have a swing-and-miss change-up.

Then again, this is just a theory. If you are interested in more data or have other ideas, drop me a line!

Looking Ahead

I trained a basic model to adjust projected FIP using a pitcher's type. For the same 2005-2009 span, pitchers tended to underperform my projections (ie post higher FIP), if they had types 0, 2, 5 and 6. The worst-performing type was type 5. The best-performing type was type 1 (cutter-throwers). There were also trends from previous years' types, again suggesting that type 0 and type 5 pitchers under-perform their expectations. However these are just weak trends. I will write something if/when I get something more definitive.

Strange Brew

I also ran a basic function, to figure out which teams had the most typical staffs and which had the strangest staffs, by pitcher type composition. The Cardinals have had the most unusual staff in the majors during 2005-2009. Closely followed by... the Brewers. So John was right in another respect. The Cardinals have the most unusual composition of pitchers in the majors during the past five years.

The most typical staff was that of the Giants, followed by the Florida Marlins.

Expect an article on this topic, as well.

Monday, May 3, 2010

Starter vs Reliever

I was looking at ways to show (in a chart or map) how pitcher performance changes when pitcher change their type (ie learn or forget different pitches). I soon realized that I needed a single rate stat to measure how a group of pitchers' performance changes. Of course, this stat has to be FIP. However, there are starter/reliever issues that need to be considered first.

This problem has been considered before, and I not aim to shed much new light on it, except to show a couple of graphs based on recent data.

To establish the difference between starter and reliever performance, there is a long, comprehensive article from Steve Treder on THT that often gets cited. It looks at starter/reliever differences throughout baseball history, and concludes that the difference between starter and reliever performance is consistently about 8% of ERA (about 0.3-0.4 on the ERA scale). However that is not the question I need to answer. Instead, I want to compare a group of pitchers' collective performances, and adjust for the groups' tendencies to be throw innings as starters or as relievers. Since starters are, collectively, better pitchers than relievers, the adjustment has to be larger than 8%.

A more useful study is Tom Tango's work on his blog. He looks at the same pitchers as starters and relievers, making several important adjustments. His conclusion is that pitchers have 17% more strikeouts, and 17% fewer home runs as relievers, as they do as starters. They also have a 17% better BABIP. He concludes that the same pitchers are "about a 1 run per 9IP" RA better as relievers. I did something simpler (although less thorough), and came up with much of the same conclusions.

Consider the graph below. I mapped IP to FIP by bucketing real pitcher seasons by IP (2005-2009 data). The graph plots the median FIP for each bucket, along with the median start percentage (% of innings thrown by individual pitchers in starting roles) for each bucket.


By looking at the median, rather than the average, it is easy to see where the transition from starters to relievers really takes place (as measured by IP). This is like chemistry class. As we move up the IP scale, the average pitcher has higher energy. However he needs to overcome a state change to move from full time reliever to full time starter. The 60-120 IP buckets find him in a state of transition. Supposing that these pitchers are all of roughly the same ability, and their FIP is different only due to role changes, I use the data to fit the following function (where start % is on a 0-1 scale):
trFIP = a + (start %) * b
Here "trFIP" represents a "translated FIP," which I assume to be constant, on average, throughout the transition. The best fit is for a = 3.92 and b = 0.81. Therefore for any pitcher:
trFIP = FIP + 0.81 (1 - start %)
Now FIP is meant to reflect the pitcher's skills at striking out hitters and at (not) giving up walks, but it also predicts ERA very well (at least for FIPs near the league average). The translated trFIP does not have this property. As Tom Tango showed, BABIP changes along with SO9 and HR9, so ERA increases more than FIP increases as relievers become starters. A similar exercise for ERA led me to this formula:
trERA = ERA + 1.19 (1 - start %)
However, trERA is not very useful in comparing talent levels between groups of pitchers. I can explain why, but let's get back to comparing groups of pitchers...

Back to trFIP, here is the same graph as before, but also with trFIP buckets by IP:


The graph between 60IP and 120IP is not quite flat. This is not surprising, as I am taking the median of individual trFIP values within the buckets. Some pitchers get the full 0.81 penalty, but not all do. I am taking the median of a combined distribution.

My study is not as thorough as Tom Tango's, but I like my results. My median trFIP buckets imply that your typical 60-80 IP pitcher from 2005-2009 is a little bit better than a typical 100-120 IP pitcher from the same time period. I think this is actually true. A 60-80 IP pitcher is very likely to be a high-IP reliever, such as a closer or setup man. He could also be a swing man/long man, but bear with me. A 100-120 IP pitcher is likely a back-end starter or also a swing man/long man who ended up filling in for an injured regular. Although the former is likely a failed starter, he must have found success in the bullpen to get such heavy use. At the very least, it is plausible to suggest that the top-end relievers are better overall pitchers than low-end starters, if only slightly.

The again, I might be reading too much into a small matter. Going forward, I will use trFIP as a simple measure of overall performance, which is not biased by changes in starter/reliever usage. It creates a useful benchmark, but is not meat to as an absolute definition of differences between starters and relievers.

Here is the same graph of average performance by pitcher type, but using trFIP instead of FIP:


The type 1 (cutter throwers) pitchers are still collectively the best of the groups, but the difference between type 2 (slider throwers with secondary pitches) and type 6 (slider throwers without secondary pitches) is eliminated by the differences in the trFIP adjustment.

Also trFIP shows type 0 pitchers and type 3 pitchers to be largely ineffective. Type 0 pitchers are those whose core pitch is a change-up, but who also throw other pitches, often sliders (as opposed to type 5 pitchers who throw change-ups, but with curveballs as secondary offerings). Type 3 pitchers throw both a slider and a curve, or possibly a slurve. 

Type 3 pitchers are similar to both the type 2 slider-throwers and the type 4 curveball-throwers, but are more likely to become type 2 pitchers. Average trFIP seems to suggest that moving from type 3 to type 2 is not a bad move. This may just be confirmation bias, but the trFIP graph seems to suggest that throwing a slurve is, indeed, not an effective way to pitch. The fact that only 4% of pitchers feature both a slider and a curveball (but 70% featured one or the other) might also suggest the same. To be fair, successful type 3 pitchers Chris Carpenter and Adam Wainwright throw both a hard slider, and a slow curve. They are not throwing slurves. But most pitchers find more success with one pitch or the other, it seems.

Now that I have a way of comparing pitchers without concern for starter/reliever issues, I will look into what happens when pitchers switch types. Expect more charts, and maybe a confusion matrix.