Do granular age adjustments add value?

One of the reasons I created the potato model was to establish a baseline so that we could assess the value added by more advanced models.

When I first started working on the project, I only wanted to add "adjustments" when it became obvious that I needed to do so. For example, the first version of this project may have only been a ranking based on raw points per game. However, it became obvious that you cannot treat stats from US High School the same as stats from the Swedish Elite League. A league adjustment needed to be applied.

It also became obvious that an adjustment needed to be applied to players for whom it wasn't their first NHL draft, and, in my final adjustment, I eventually realized I had to do something about player height as well. I originally did not want to make any sort of size-adjustment but the initial results were such that far too many short defensemen were getting picked way too high, so a height adjustment was necessary.

That ended what I felt were the "obvious" adjustments that needed to be made, and I felt was sufficient to establish a baseline. The potato thus will forever only contain these adjustments

I have seen many other models that have applied other adjustments based on a host of other factors. The question I have never seen answered however, is whether or not those adjustments have actually improved the model.

Now that we have our potato set up as a baseline, we can experiment with making other adjustments and evaluate whether or not making these adjustments actually result in the model making better picks. In this post, I am going to focus on birthday.

As mentioned, the model does contain a factor that accounts for whether or not it is the player's first draft, but it does not look at a players birthday beyond that. There is no accounting for whether the player has an "early birthday" or a "late birthday" or anything in between. In order to help us determine if accounting for these things really matter, we can experiment with making a series of adjustments and see if it leads to better results.

First, the baseline. I have evaluated the potato using picks from each team for the 2012-2017 drafts and come up with a an average of 9,993 TOI. In other words, the potato has received an average of around 10,000 minutes more from the players it has picked than each respective team using the same picks. But that isn't so important. The important part is that we have a baseline.

Plain Old Potato: 9,993

Now we can try a few different birthday-related adjustments and see if they result in a better or worse score.

For the first attempt, I will simply apply a factor for 18-year-old players, based on the month in which they were born.

If they were born between September 16 and November 30 of their draft year they are considered "earlies" and are thus old for their draft. What follows is pretty dry, but it is a table of values for some of the coefficients that were applied and the resulting performance of the model.

This only applies to players in their first NHL Draft.

SeptemberOctoberNovemberTOICompared to Potato
0.75119056-937
0.91199930
11199930
1.25117589-2404
0.90.7519878-115
0.90.919874-119
0.90.9519852-141
0.90.90.99674-319

Remember, this is crunching through a fair amount of data. It is replaying each of the last six drafts as each of the 30 teams and averaging the results. Overall, we were not able to improve upon our initial results by applying any sort of "early birthday" adjustment. Does this mean that you should not pay attention to a player's birthdate and not take into stock for example when Brady Tkachuk is as old as possible for a first-eligible draft player? Not necessarily, but it means that we were not able to find sufficient evidence that we can do better than the potato by accounting for this.

What about the other side. Is there value in "boosting" a player if he had a very late birthday? That is, should a player born between July 1 and September 15 be ranked higher because he is very young for this draft class?

SeptemberAugustJuly Compared to Potato
11199930
1.1119712-281
1.2119710-283
1.051.05110223230
1.051.051.0510199206

In this case, we actually were able to get a very small increase in performance when making a very small adjustment to those born between August 1 - September 15. When applying the same adjustment to those born in July we got a slightly less improvement over the potato.

What does it mean? In all honestly, probably not much. The improvements only held when the adjustments were very small, and the improvement itself was very marginal. When doing this kind of analysis you have to be concerned about overfitting. At best we can say that there might be something interesting there that may be worth further study.

Overall, I find very weak evidence in support for factoring granular age-based adjustments into your model. We were not able to improve the potato by discounting players very old for the draft, and while we did see a small improvement from adjusting up players who are very young for the draft, it was a very small effect that could have just been the results of idiosynchricies with our data set.

The potato's method of simply accounting for whether it is the player's first year of draft-eligibility appears to be sufficient, such that throwing in more complicated logic based on birthday may not add any additional value.

comments powered by Disqus