Can FancyStats predict the rest of the season?

It is approaching the 41-game mark of the NHL season, which is always a prompt to evaluate the first half of the year and to try foolishly to predict what is to come. How much of what has happened so far will continue to happen? Who are buyers and who are sellers at the deadline? What is the deal with the Oilers? The Canucks? Analysts: What do they know? Do they know things? Let's find out!

One thing you will see every year is hockey analysts making predictions about teams based on their so-called “underlying numbers.” I have always been skeptical of these claims as anecdotally I haven't really seen them bear significant fruit on anything close to a consistent basis, and I would like to examine the validity of making such predictions of the future based on this data.

Specifically, there is a strongly-held notion that teams whose winning percentage far out-paced their “underlying numbers” are bound for a regression in the second half. Similarly, there is a notion that teams who “underperformed” in the first-half based on these same metrics are due for a much stronger second-half. This seems to be an uncontroversial, almost universally-accepted notion among NHL analysts.

But does it actually happen?

Before doing a deep dive, we can do a very preliminary look by just examining the last season. Well, the last full season, that is. Which means going all the way back to 2018-19, a season that was only 3 seasons in the past but feels like it may as well have been 30. Does anyone remember that the Blues won the cup?! Yes, those Blues. It feels like a half-forgotten dream.

Using NaturalStatTrick’s teams page, we can take a virtual time-machine to a specific part of the 2018-19 season, and conduct a modern-resembling prognostication about the remainder of that season that we can verify instantly with the power of hindsight.

Let us travel to January 5, 2019, at which point in that year teams had played between 39-44 games, and treat this day as the magical mid-point of that season, at which point we can attempt to evaluate the results in front of us in the same way that a modern hockey analyst will soon be evaluating the 2021-22 season.

The first thing I notice are these three teams:

  • Dallas Stars (.571 Pts%)
  • New York Islanders (.625)
  • Washington Capitals (.650)

These teams all have bottom-10 5v5 CF% at the mid-way point of this season, but are doing very well in the standings. Now, I can hear you screaming at the screen, "CF%! What is it, 2011? Nobody uses CF% anymore!" Well, we will get to that, but I can assure you that there are still many analysts that are making these predictions based on CF%, and if I asked most people if CF% does a decent job of predicting the 2nd half of the standings, I suspect many people in the analytics community would say yes. To keep things simple for this preliminary review, it will do.

As at the mid-way point of the 2019 season, the Stars are 5th-worst in the NHL in CF% but middle of the standings, the Islanders are 7th-worst with the 7th-best record in the NHL, and the Capitals are 9th-worst Corsi-wise with the 4th best Win% in the league. All three of these teams seem primed for a second-half regression based on their “underlying numbers,” and now we can do a simple check to see if that’s what happened:

As we can see, the teams did not really regress by a significant amount after the halfway-pole. The Capitals fell off a little bit, but the Stars and Islanders basically put up the same winning percentages in the second-half that they did in the first-half. More interestingly, what we saw instead was that all three teams improved on their CF% in the new year. The Capitals and the Islanders only improved by marginal amounts, but the Stars went from the 3rd-worst CF% in the NHL in the first-half, to a perfectly respectable 50% afterwards.

This preliminary analysis suggests that perhaps it is actually the CF% that reverts to more closely match the Win%, rather than the other way around. But let's take a look at the other side of it. What about the "underrated" teams, that are performing well by CF% but not performing as well in the standings, and are thus primed for a breakout?

At first glance, this is definitely more promising. The Hurricanes, who under-achieved in their first 41 games despite the #1 CF% in the NHL, did indeed break out with the 3rd best record in the NHL after. The Flyers also performed much better in the second-half, but the Wild performed even worse.  

What's also interesting about this is how in all cases, the CF% and the P% moved in opposite directions from one other. The Hurricanes performed better in the standings, but their CF% actually fell off a fair bit. The Flyers also improved in the standings with a .560 record, but their CF% in the second-half was the second-lowest in the NHL after a solid first-half. The Wild fell even further in the standings, but their CF% got even better! So is there actually something useful to be gleaned from this, or is it just noise?

We could repeat this exercise for more seasons, and also for other metrics like FF%, xG%, and so on, but I think now that we've set a baseline for how we are going to be tackling this problem, it is time to kick it up a notch.

I pulled down data from every full 82-game season starting with 2007-08, giving me 11 seasons in total. I calculated 5v5 CF%, FF%, xG%, and Win% for the first 41 games of each team in each season, and then calculated the correlations between those metrics and each team's second-half Win%.

Based on the way I pulled down the data, I was unfortunately not able to separate out OT/SO games, so for the purposes of this, the Win% in both halves is simply the amount of wins divided by the amount of games played, where OT wins and SO wins count as wins. I would rather not do it this way, but unfortunately my data collection method didn't give me another option, and I don't think it should bias the overall results any. Still, it is something I will address in Part 2 of this study.

What follows is a data table of the 11 seasons I looked at, showing the correlation of each team's numbers in their first 41 games to their Win% for the rest of the season.

The first thing I notice is that it is all over the place. In the 2007-08 season, first-half Win% was completely uncorrelated to second-half Win%, as the teams that had stellar first-halves actually had weak second-halves (this includes the Stanley Cup Champion Red Wings, about whom much has been written before.)  In other seasons, like 2013-14, and 2016-17 there was a much stronger correlation.

Likewise, CF% bore basically no correlation to back-half Win% in 2017-18, but was strongly correlated in 2018-19, despite my above preliminary analysis that showed only one team drastically changing their direction to fit their CF%. In some years, xG% performed much better than raw Corsi, and in some years it performed much worse.

If you average it all out, you see that there is basically no real difference here which numbers you use. There is some correlation between how a team performs in its first-half and its second-half, of course, but it is pretty weak, no matter what you define "performance" to mean. Using a metric like CF% or FF% or xG% doesn't seem to give you a much better chance of predicting the second-half with any more accuracy, and which one is best varies wildly from season to season.

Upon reflection, this shouldn't be very surprising. If a team's Win% and its CF% can be well out of whack in the first half of the season, then why wouldn't they also be out of whack in the second-half? Moreover, we are only looking at 5v5 numbers which means we are intentionally excluding special teams and goaltending.

Analysts like to exclude special teams and goaltending because they are extremely volatile, but they are still important to winning hockey games. Making prognostications about a team's fortunes when one only concerns oneself with 5v5 play is always going to be fraught with error when a team receives good enough (or bad enough) special teams play and/or goaltending to alter their place in the standings irrespective of their 5v5 play.

Another interesting outcome of this study is that 5v5 xG% seemingly performed the worst out of all. Common wisdom is that xG is the superior metric, and essentially allows simple Corsi and Fenwick to be deprecated. If that's so, why does it actually do a worse job here? This analysis would seem to show that you are, on average, better off simply using basic win% to predict a team's second half results than to use xG%, which correlates even more poorly than CF% and FF%.

One of the other things we can look at is to correlate first-half to second-half internally for each individual metric. That is, compare CF% in the first 41 games to CF% in the last 41 games, and so forth. This should help us understand which numbers are the most/least volatile:

Here we can see that there is indeed a much stronger internal consistency with these metrics - a team's first-half CF% is strongly correlated to its second-half CF%. This correlation is much better than simple win% correlation. Interestingly enough, we see once again though that simple CF% seems to be less volatile than FF% or the presently-preferred xG%. I do not know why this is, but it is worth a deeper dive, which I will save for a future article. Regardless, it is fair to assume that a team that performs well in these stats should continue to perform well, as they are not as volatile as win% is. But it still doesn't seem to follow that this means that they will necessarily do any better in the standings in the final 41 games of the season.

Regression towards the mean is obviously a thing, but "regression towards the underlying numbers" doesn't appear to be, at least not in this first examination. This is not the final word on the discussion; more needs to be examined including using metrics that contain various adjustments such as score adjustments. But at the very least, one should be wary of analysts claiming that a team is bound for an improvement or a regression in the second-half of the season based on their first-half CF% or xG%.

I would love to hear what people think, and also please let me know if you encounter an analyst making claims like the ones I've discussed in this article. If they are making their claims based on a different metric than the ones I have presented, please let me know and I will add their metric to the study for part 2.

Header Photo by Damir Spanic on Unsplash

comments powered by Disqus