I recently bought the book Analyzing Baseball Data with R for fun, and came across the interesting topic of career trajectories. The basic idea is that, for hitters, they generally improve from the start of their career, peak at some time around age 30 and then decline in skill over time. You could almost describe a hitter's career trajectory as parabolic. That's where the model comes in.
Sorry, I'm going to go all Algegra I on you all now. A parabola comes from a quadratic equation that usually looks like...
y = ax2 + bx + c
For the models in the book, they use something a little more meaningful in baseball than x and y. They use OPS and hitter age. That produces a formula that looks like this instead.
OPS = a(30 - age)2 + b(30 - age) + c
By taking the OPS and hitter's age minus 30, the theoretical peak, you can use R, a statical programming and analysis package, to model a players past and hopefully use it to predict their future with some sort of statistical significance.
The book does an excellent job of going over this so I figured I'd take a look at a few of the White Sox hitters that are over 30 to see where they've been and where they are going. Fortunately for you all, the math is pretty much done, but enjoy the nice graphs which give a fairly clear picture of what to expect.
As one would expect, Paul Konerko's career trajectory follows the traditional curve nearly perfectly.
Now, isn't that nice. Konerko reaches his peak just between age 29 and 30 or basically the 2006 season. Unfortunately, after his peak is where the Sox have certainly seen more variability in his performances. 2007 through 2009 and 2013 were all significantly below the predicted line, while 2010 through 2012 were above. So, for the statisticians out there, the R2 of the line is 0.5944, so just over 59% of the variance seen between years can be explained by the model. That's not great, but it shows there is a relationship between Konerko's OPS and his age. The p-value for this model is 0.001151, so we can't reject the hypothesis that there is a relationship between age and OPS for Konerko if we're going with the standard 95% confidence interval (or even 99% in this case).
The nice part about these kinds of models is that we get coefficients so we can plug in different ages and see what it predicts. For Konerko, the model predicts an OPS of .686 for next season. That's not so great, but at least it's better than the 0.669 in 2013, but is far below the .796 projected in the 2014 Bill James Handbook.
Not a perfect fit, but following the trend
Adam Dunn's career trajectory isn't exactly a perfect fit, but the trends we've seen the past few seasons started before he came to the White Sox.
According to the modeling, Dunn peaked around 2004 and has been declining since. Unfortunately, Dunn's 2011 was so historically bad, it seems to be messing with the results of the model. The R2 is only 0.3404 while the p-value is 0.1014. While this might not be the best model for Dunn, graphing Dunn's OPS does show a pretty clear linear trend of a declining OPS since 2009. The model has Dunn pegged for a .626 OPS, but the linear trend in red has been slightly above that line with a roughly 30 to 40 point OPS loss per year. There's certainly some uncertainty here, but expecting a big return for Dunn at the trade deadline might not be a good plan. The Bill James Handbook predictions of .763 would be well above the linear trend seen over the past five years. The Steamer and Oliver projections at Fangraphs also has him in the .750 to .760 range. Let's hope they're right.
The new normal
Ramirez turned 27 during his first season with the White Sox. That also was his best season based on OPS. Looking at the curve itself, it's cupping so it isn't giving the kind of model we'd expect to see. It does fit nicely as can be seen by it's R2 of 0.7166, but with a p-value of 0.1509, it doesn't quite cut it. So, I re-ran the model using a far more typical linear regression of...
OPS = a(30 - age) + b
There we go. The R2 is 0.6933, and with the p-value is down to 0.03966, there certainly seems to be a relationship between Alexei's age and his OPS. Alexei's career sure is agreeing with the Fangraphs article. Unfortunately, this model predicts an OPS of .586 for Alexei next year. James has a .698 for Alexei while the Steamer and Oliver projections have an OPS .698 and .660 respectively for Alexei.
A random walk
After plotting Jeff Keppinger's OPS, I decided I didn't need to go any further and try to model it. His OPS is probably as close to random as I can imagine.
I would love to try to predict anything off of this but there's no way. If I could draw a line by eyeballing it, I'd probably draw a straight line at about a .650 to .675 OPS and that probably is as good as any model I could try. Bill James's projection of a .692 seems a bit optimistic, but, along with Steamer's .689 and Oliver's .674, it all seems to be in the realm of possibilities.
Using the idea of career trajectories, it looks like, oddly enough, that Paul Konerko could have the best year hitting out of Konerko, Dunn, Ramirez, and Keppinger. The ability to predict with Dunn's model is suspect, but looking at the trends since 2009, another 30 to 40 point drop in OPS off of last year seems pretty likely. Also, if Alexei's predicted drop to .586 happens, the need for a new shortstop could be immediate. As for Keppinger, who knows what he could do.
As far as Analyzing Baseball Data with R goes, I still need to dig into the book much more. If you are at all technically inclined, it would probably be worth a look. It does hit a sweet spot for me of computer programmer, statistically inclined, and baseball junkie, so your mileage may vary. I think from the R side, I might need a little more remedial work from something like Learning R, but that might be me getting ahead of myself. In the meantime, expect some more fun graphs from me in the future.