I removed the known retired pitchers from training (all 15 of them), and so I have projections from my new model for 2010 IP. Pitchers are sorted by last year's IP.
I am using the same features and same process in the training of the models. Actually, I also looked at "games" and "games started" this time around, but that was a waste of time. No new information there. The correlations with actual IP stay the same, and even average IP rises by only a tiny amount.
What does happen, though, is that veteran starting pitchers are not regressed as heavily due to age. Also long-term value (ie value produced in the past four years) gets more weight in the model than in the previous one.
In a separate list, I have this model's projections, listed next to those of the last model. High-end older starting pitchers seem to gain IP, which I think makes sense.
Roy Halladay, Tim Lincecum, Bronson Arroyo, Javier Vazquez and Zach Duke all are projected at +10-12 IP more than in the previous model, probably because the new model gives them more credit for VORP and IP before 2009. Of those guys with high IP totals in 2009, only Josh Johnson, Jon Lester and Scott Baker lost more than 7 IP from the previous projection. Again, this is probably because those pitchers did not have high VORP or IP before 2008-2009.
Overall, the projected IP for the high-IP guys from 2009 is up (+2.5 IP for pitchers with 200+ IP in 2009). I think this is an improvement, even if my system is now lowballing some very good young pitchers.
Future Ideas
This should probably be it for my IP projections from season stats. However I am skipping two important aspects of a player's playing time projection: consistency and momentum. Since regression is more likely than improvement for most pitchers, my model (as well as most other predictive models) tend to project a pitcher at his recent level of performance, regressed downward to replacement level. However if the player had shown great consistency in the past few years, then perhaps we should regress him by a smaller factor. Alternatively, if the player has made a large improvement from past performance, perhaps he is on an upward trajectory that is not yet complete. His additional upside should perhaps earn him some credit then, also.
As is, I am treating each player's future IP (and also his value, in other models) as a state that can be predicted from different sets of averages (seasonal or multi-season). I am not looking at his performance as a time series, but perhaps I should do so, as well. This sounds a bit like stock projection (or rather it can use the same methods). I've been told that stock analysis have methods to project a stock's established level, and also to estimate the probability that the stock is establishing a new level (improvement or decline), rather than fluctuating at the current level.
However I'm not sure whether this approach can work for projecting pitchers, so I will just file it away for now. Besides, the yearly stats time series is way to course a scale for this kind of analysis. I should at least split the pitcher's performance into quarters before I start mapping his performance time series. Definitely a project for after the season starts, so don't expect to see anything here about that here in the near future.
What you should expect, however, is to see something for tweaking the IP projections using injury data. My previous attempts at using rich injury information did not work. However I now have a better set of injury data, including better pre-season and post-season injury coverage. At the very least, I hope to be able to distinguish between players who miss a season due to injury, compared to those that miss the season because they couldn't crack a major league roster. It's hard to imagine that information not impacting future projections.
Also, my basic value projections (VORP, ERA, etc) are overdue...
My writings about baseball, with a strong statistical & machine learning slant.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment