My writings about baseball, with a strong statistical & machine learning slant.

Tuesday, November 16, 2010

Why R makes more sense than R^2 (in looking at correlations in baseball)

I've been so far removed from baseball that I have not read Tom Tango's "The Book" blog in some time. Thankfully I recently ran into one of his posts anyway.

Here, Tom gives a succinct qualitative explanation of the difference between r and r^2 (r squared) in studies of statistical correlation. If half the data points are a direct match for a simple linear rule (e.g. equality) and the other half of the data points are random (within the same range), then your r is 1/2 and your r^2 is 1/4. It is more natural to say that the correlation is 1/2. That makes sense. The 1/4 deals with variance, which is not an intuitive concept. Tom's full article is here:

This makes me feel a whole lot clearer about the results of my fastball speed vs pitcher strikeout rates studies from last year. I looked at a variety of non-performance characteristics for pitchers, and how those helped to explain the pitchers' strikeout rates. Non-performance means that I looked at physical characteristics like age, height, weight, and (left) handedness. I also looked at which pitches they threw, how often, and how hard. By far the most relevant characteristics were fastball speed and handedness. I got about a 0.6 correlation (ie r) between those two factors and strikeout rates. That looked pretty important. However 0.3 r^2 sounded less significant.

What that means that is that fastball speed and handedness explain "only" 36% of the inter-pitcher variance in strikeout rates, but they explain about 60% of the difference in terms of standard deviation. Or at least that's how I will think of it from now on. What I found was significant, and I believe interesting. It has not gotten much run in the nerd baseball press, but perhaps this is because I did not promote it. That won't change. But if you are interested, look for some of my articles from the winter of 2009-2010. I have not seen anyone write about this before or since. Although I haven't checked recently.

For what it's worth, it seems like teams are valuing (left) handedness and fastball speed more highly in prospects than ever before. This is not surprising; sports always tends to evolve that way. As money increases and the talent pool grows, rare ability and "natural" talents takes precedence over common ability and "refined" talents and experience. Just consider players like Amar'e Stoudemire in the NBA. That is why I thought that Aroldis Chapman was a bargain for the Reds, purely based on his fastball speed and handedness.

At some point, I should go back and see how the rest of my predictions fared, including my 2010 IP and ERA projections for all MLB pitchers. But that would require more effort than writing a short article like this one.