If you've been following along here, the previous two blogs on player usage have sort of built to this part. We understand that a players metrics can be haunted by their usage, so is there a way to get a relative look at all players by adjusting for their metrics? Well, there are a few styles of adjustment out there. This blog is going to look at two prominent ones.
First, we are going to look at stats.hockeyanalysis.com. Over there, to adjust for zone-starts, what they do is eliminate the first ten seconds of data after every faceoff in the hopes of giving players who start in the d-zone ample time to recover the puck.
Let's look at a few players who are on opposite ends of the spectrum in terms of zone starts first.
Looks like it's working okay. Remember, these guys are the EXTREME outliers of both negative and positive zone starts respectively. HockeyAnalysis' zone start adjustments doesn't really move them too far in either direction, considering their usage, by just eliminating 10 seconds of play.
It is here with the defensemen that we run into trouble by eliminating data as a means of adjusting for zone starts. We can see, that Martin Marincin, despite horrid zone start usage, sees his adjusted CF% decrease. Further, David Rundblad, who received cushy zone starts, sees his adjusted CF% increase.
Now, yes, these are small decreases and increases of 0.19% and 0.03% respectively, but keep in mind that these players are the outliers in terms of relative zone starts. If the ten second rule doesn't dutifully adjust for the outliers, can we trust it to adjust for players who are near the noise of -5 to +5 zone starts?
Eliminating ten seconds of play just seems to be too cookie cutter of an approach to adjusting CF% based on zone starts. It helps some players, while hurting others.
Rather than looking at this stat as an adjustment factor, it could be better served as a player evaluation tool. You can see player performance based on zone starts. Erik Karlsson gets positive relative zone starts, so you would expect his adjusted CF% to go down. However, Karlsson is the elite offensive-defenseman in the game today, and is too good for that. When Karlsson starts in the offensive zone, based on Corsi possession metrics, the puck stays there.
Using a fixed-variable regression tactic to adjust for zone starts (and other factors) may be the better route to go. Then, we could see what a players' expected CF% is in certain situations, and how their actual CF% matches up.
This is what War-On-Ice's dCorsi and dFenwick metrics set out to accomplish.
At this link, you will find a full breakdown of dCorsi, created by Steven Burtch, including what factors he uses, and how his methodology led him to creating this adjustment factor. In the end, Burtch developed an 'Impact' factor based off of how well each individual player performed to his expected corsi for and expected corsi against events. Did he generate more corsi for events than expected? Did he limit more opposition shot attempts than expected? That's what Burtch sets out to discover.
Currently, I believe this is the best adjustment factor out there, outside of 'eye-balling' player usage charts.
However, there are a few issues with this analysis...
Impact of teammates is weighted entirely too heavily. Of course, there is no personal bias in a regression model, meaning teammate impact is obviously a factor in generating an expected corsi metric, but if we look at the dCorsi expected corsi for% of players, we see it's essentially the main factor.
If we calculate the dCorsi ECF% via the War-On-Ice table, and eliminate all players who played in 24 or less games this season, we get some interesting data returned...
The 23 players with the worst dCorsi ECF% for the 2014-2015 season are all Buffalo Sabres. There were 25 players who suited up more than 25 times for the Sabres this season. The two outside the worst 23 are Tyler Ennis (26th) and Matt Moulson (31st). You have to go to player 30, Manny Malhotra, before you see a player who isn't a Sabre, Flame, or Avalanche player; all three of these teams notorious for their possession metrics this season.
What kind of a correlation actually exists between teammate quality and corsi for percentage?
Doesn't seem to be very strong by looking at our chart. Running an actual correlation in R between TOIT% and CF% for the 2014-2015 season using players who played more than 300 minutes, we see a correlation of .3147897.
And if we pick a random subset of 200 variables in the data, our correlation plummets to 0.198999.
Yet, our dCorsi expected corsi for percentage seems heavily biased off of quality of teammates.
We are going to need a new metric.
If we change our variable from TOIT% to tCF60 (the corsi for events per 60 of a players teammates when the player is not on the ice) we see a different story.
The correlation between tCF60 and an the individual players CF60 is very strong at .7871679.
And if we select a random subset of 200 players, the correlation remains very strong at .7942923.
This makes sense. How much can a single player, who has played a significant amount of time (over 300 minutes) at 5v5 actually effect his teammates CF60 metrics?
Zach Redmond performed at a rate of 49.51 CF60 this season. When he was off the ice, his teammates performed at a rate of 38.07 CF60. Zach Redmond led the league in a stat I am now dubbing tCF Contribution at +11.44. Not so surprisingly enough, Zach Redmond also had positive relative zone starts to the tune of +7.31.
In fact, if you look at the top 20 players in tCF Contribution, only 2 of them have negative relative zone starts; Jordan Staal (-3.83 relZS, +8.08 tCF Contribution) and Justin Faulk (-2.77 relZS, +7.63 tCF Contribution).
Manny Malhotra performed at a rate of 35.15 CF60 this season. When he was off the ice, his teammates performed at a rate of 51.91 CF60. Manny Malhotra was the worst player in the leage in tCF Contribution at -16.76 tCFC. Similarly to Redmond, perhaps Malhotra's metric was swayed by the true signal, zone starts. Malhotra was a -39.51 relZS player this year, as it was in Montreal's playbook to send Malhotra on the ice for defensive zone faceoffs to win the draw, and get off the ice as quick as he could.
If you look at the bottom 20 players in tCFC, only 3 of them have positive relative zone starts; Jared Boll (+0.83 relZS, -13.58 tCFC), Derek Dorsett (+1.14 relZS, -8.51 tCFC), and Chris Phillips (+1.17 relZS, -8.26 tCFC).
Like relative CF%, it appears there is also a correlation between relative zone starts and tCFC.
Running the full data set from the 2014-2015 season, the correlation between relative zone starts and tCFC is .5148566.
Running a random subset of 200 players, our correlation stays consistent at .5017462.
The question then becomes in dCorsi, are we looking too much at the noise (teammates/competition) and not enough at the signal (relative zone starts)?
One way we could try and focus more on the signal could be to run a non-linear regression on zone starts and relative corsi for percentage. As we saw in part II, the noise in the middle of the spectrum can definitely effect the swings we see on the outliers outside of that -5 to 5 noise, a non-linear regression may help us account for that noise more effectively in our adjustments.
Teammates and competition are certainly necessary variables in adjustment factors, but we have to accept one of two things in the case of teammates and competition:
- Our analytics just aren't there yet to effectively measure teammates and competition of players and TOIT% as well as tCF60 don't seem to be perfect examples of measurement.
- Hockey is a fluid game and big ranges between competition and teammates from 'easiest' to 'hardest' will not exist, causing difficulty in analyzing their actual variance on possession metrics.
Daniel Girardi skated the most this season against Giroux, Tavares, Bailey, V. Rask, and Voracek. His TOIC% according to War-On-Ice's metric was 17.78%. Dan Boyle skated the most this season against Glencross, Riley Nash, Strome, Brouwer, and Nielsen. His TOIC% according to War-On-Ice's metric was 17.26%. Dan Girardi's cCF60 (competition corsi for per 60) was 54.72. Dan Boyle's cCF60 was 53.74.
Something just doesn't seem right about that to me.