I'll often find myself trying to do some analyses, pulling tables off of War-On-Ice.com, manipulating the metrics, seeing what I can do. More often than not, I get stuck. I have to ask myself, what am I really looking at here? What is the significance? I'm going to try to dive into that a bit here, if for nothing else but my own education, to see what we are all really looking at when we look at 'advanced' stats.
Ask any NHL pundit what they think of time of possession, and they'll tell you it's a fantastic metric that has a ton of value in evaluation. Ask that same NHL pundit what they think of Corsi, and you won't get the same reaction from many of them. In fact, Corsi's original purpose wasn't a possession metric, it was to developed by Jim Corsi, goalie coach of the Buffalo Sabres, to measure goalie workload. It didn't take analysts long to realize that it could be used as a proxy to possession, and the revolution was born.
How important is Corsi? Well, it can be used as a bit of a predictor in a small sample size. Since 2005-2006, you can pick the correct winner of a playoff series by just selecting the team with the regular season 5v5 CF% 60 percent of the time. Ask any gambler, they'd take those favorable odds any time.
At its basic foundation, you record a corsi for event because you have the puck. You need to shoot to score. Goals lead to wins.
Corsi -> Fenwick -> Shot on goal -> Scoring Chance -> Goal. That is the building block foundation. Corsi becomes so important, because it provides the most data over a small sample size, recording every possible shot attempt, not just those that are unblocked.
To get a basic understanding of just how important these metrics are, I have calculated them in relation to Wins, Points, Win Percentage, and Points Percentage. Using 5v5 data for every team since the 05-06 season.
The first thing that stands out to me in this, is how disappointing goals is as an explainer for wins and winning percentage. However, it is a far superior explainer than Runs is in baseball for wins (In baseball, Runs to Team Wins has an R^2 of 0.3759 since the 1969 season. The foundation of Sabermetrics is built on scoring runs. Hockey has a better foundation than baseball).
Seeing that goals is the most important metric for team wins and team points (duh), we need to go down the line and see what is important for goals.
This isn't really the most encouraging analysis. You can infer, that hockey analytics (as the public knows it), is being built off of a metric that explains only 52.67% of the variation in the most important metric to teams. Goals. Sabermetrics ran into the same issue. Home Runs, and batting average were sub-par variables in determining team runs. So they did something about it. Sabermetricians manipulated the data until they made it great. Coming up with metrics like On Base Percentage, Slugging Percentage, On Base + Slugging, and Runs Created to get more in-depth and accurate analysis for team and player performance.
Some of these R^2 results are basically unheard of in the hockey analytics community. And to be clear, a lot of them might be impossible to achieve with today's capabilities. Hockey, for as many comparisons as I have drawn (and will continue to draw) to baseball, is just infinitely more fluid of a game.
As Goals For Percentage remains an important variable on team winning percentage and team point percentage, our best R^2 variable comes in the form of ScoringChancePercentage at .343. Certainly, we can do better than this.
If we apply the basic fundamentals of slugging percentage to hockey, we can achieve this goal. Slugging Percentage is a metric that shows, on average, how many bases a player achievers per at bat. Quality of at bat.
Here is the formula I've been toying with:
Hockey Slugging (hSLG) = ((Shots For +/-) + (Scoring Chances For +/- * 2) + (Goals For +/- *3)) / (Corsi For + Corsi Against)
Not nearly good enough, and completely arbitrary, but still better than any other metric we use today.