My writings about baseball, with a strong statistical & machine learning slant.

Wednesday, February 10, 2010

Why strikeout rates from pitch data are interesting? (Part 2)

In Part 1, I showed that information about pitch data, bio data and league (AL vs NL) is helpful for predicting strikeout rates, but not as helpful as simply looking at the swinging strike rates. So then why should we care about predicting strikeout rates from pitch data and bio data?

I don't need to explain why predicting strikeout rates is important. But I will do so anyway. Bill James has written often (behind the pay wall) that strikeout rates are a good predictor of how likely a pitcher is to be effective in the future. There has been a trend toward higher strikeout rates for many years, and effective pitchers who don't strike people out are a dying breed. Bill James suggests that this trend will only continue. Thus it is important for teams to develop high-strikeout arms for the future. This is not to say that other factors of a pitcher's performance are not important. But strikeout projections are very important for a prospect's value to a club.

This is why I am focusing on non-performance related factors in predicting strikeout rates for major leaguers. I have never spoken to an MLB scout and I don't know much about minor league stats. But I think that the factors that I am looking at might be easy for a scout to project.

I am looking at biographical factors like:
  • handedness
  • height
  • weight
  • age
These will not change for a prospect, or at least they will change very predictably.

My pitch data includes factors like:
  • how fast is his average fastball?
  • how often does he throw it?
  • how often does he throw a breaking ball?
  • what's the speed differential between his fastball and change-up?
  • how deep is his repertoire (using a statistic that I invented)?
Also, I use:
  • IP (innings pitched), but only to allow the model to differentiate between starters and relievers.
  • league (NL vs AL), but mostly to adjust for that fact that NL starters face the opposite pitcher, and thus have slightly higher strikeout rates that have nothing to do with ability.
  • YEAR (as an number), to factor out yearly trends. However, this is almost always ignored by the model, in any case.
None of these features consider the player's results-based stats from the major league level. Also, I think that a scout could predict all of these features, at least within a range.

I would like to see my model translate a pitcher's scout projection into a projection of his future strikeout rate. And I think it can.

Say you've got Joe Dirt. His fastball sits at 92 mph, and touches 95. He's got a plus change-up, and a below average slider that he can become MLB average if he works on it. He projects as a starter. Oh yeah, and he's a lefty. Sounds like one heck of a prospect. But can we project all of that information to a MLB strikeout rate? With my model, it is possible.

Furthermore, my model should even be able to place error bars on the output. In previous posts, I have discussed how variance in predicting strikeout rates is related to IP. In a more recent post, you can see how my prediction accuracy changes by fastball velocity.

With more data and more time, I should be able to not only predict a pitcher's strikeout rate reasonably accurately, but I will also be able to say how confident I am in that prediction, based on his fastball speed, his handedness, and whether or not he throws a slider.

I would hope that such a system would be useful to scouts, and to the teams that employ them. If you are reading this and you own or run a major league team, feel free to email me. Despite the name & references to Mother Russia, I am a baseball-loving American citizen. Originally from Russia.

The system does have limitations, not least of which is the fact that my entire sample space is guys who've made the majors. I have no data on 95 mph lefties who never got to AA. I know there is much more to pitching than being a hard throwing lefty.


The swinging strike study (showing that swinging strike rate is very highly correlated to strikeout rate) uncovers some valuable truth, and this is not a criticism of Jeff Sullivan's work. However I'm doubtful that swinging strike rates observed at lower level will matter much for predicting those same rates at the major league level.

My own baseball career topped out at around 12 years old. I was a good Little League pitcher. I didn't throw too hard, but I got plenty of swing-and-misses from guys chasing my slow "fastballs" outside, and my "sliders" in the dirt. There is no way that my crap would have worked in high school.

No comments:

Post a Comment