Replacing Derek Stepan: re-create him in the aggregate

All stats for this blog provided by unless stated otherwise.

Most of the talk surrounding the Rangers this off-season is, rightfully, how the Rangers are going to replace Derek Stepan in their lineup. Say what you want about Stepan, but this narrative that he wasn't a "1C", or that his contract was going to be a hindrance to the Rangers in the future is just wrong. We have no evidence that shows that Stepan's career trajectory was going to start taking a hit. What we did have from Stepan last year was a career low shooting percentage, and the confidnece of the organization shook. Shook enough to move Stepan for futures. There ain't no such thing as half-way GMs.

But, we move on. Stepan has been dealt, and now the Rangers are tasked with 'replacing' him.

I find myself always thinking back to this scene from MONEYBALL where the scouts are hung up on replacing Giambi, Damon, and Saenz for the 2002 Oakland A's. Billy Beane and Paul DePodesta flip the script. We get hung up on names, we need to look past the names and to the numbers. You don't replace Stepan, you re-create him in the aggregate.

Thus, the Rangers aren't necessarily tasked this off-season with replacing Stepan. What they are tasked with, is figuring out if Mika Zibanejad is the guy to re-create Stepan on the top-line. If Kevin Hayes is the guy to re-create Zibanejad on the second-line. If Lias Andersson is ready to take on the third-line. If David Desharnais can re-create Oscar Lindberg on the fourth-line.

Our main focus, of course, flips to Mika Zibanejad. But first, we need to know what we're trying to re-create, and what the Rangers lost, with Derek Stepan moving to Arizona.

Combining the last two seasons of 5v5 play among NYR forwards, Derek Stepan was third on the team in raw points, third on the team in points per 60, third on the team in individual shot attempts, second on the team in relCF%, first on the team in relGF%, fourth on the team in relative shot suppression, and second on the team in relative shot generation, and third on the team in time on ice per game played.

On the PK, Stepan provided stability. On the ice for the team's fourth best GA60 metric.

On the PP, Stepan provided offense. On the ice for the team's second best GF60 metric. 

What the Rangers need to re-create, is an all-situations, top-line, effective player. And with the questions above, forcing everyone to step up a level on the depth-chart, the most important item the Rangers need to figure out is whether or not Mika Zibanejad is the guy to replace Derek Stepan.

For the comparbales below, I'm only using 5v5 data from last year so it's Rangers to Rangers data and not Rangers to Rangers/Sens data.

Obvious heavy-edge to Stepan here in most categories sans production per 60. Which, at least, is slightly encouraging. Had Zibanejad played 81 games like Stepan did, it's likely that he'd have caught up in goals (obviously), and perhaps even points. Stepan is what he is. He's a 55-60 point NHL player. Considering Zibanejad's production history, it's not totally out of the question that Zibanejad can also become a 55-60 point player, if not more. Offense is not where the Rangers are going to hurt by using Zibanejad in the Stepan role.

Where they're going to hurt, if anywher,e is in the on-ice impacts that Stepan had. An interesting piece to isolate this data may be to look at Chris Kreider. Both Stepan and Zibanejad spent quite a bit of time with Kreider as their winger last year.

Stepan with Kreider: 614:29 | 53.3 CF% | 59.5 GF%
Zibanejad with Kreider: 295:21 | 53.8 CF% | 53.3 GF%

Where Derek Stepan helps you is in his subtlety. His on-ice impacts in terms of both shot attempts and goals speak for themselves. He's going to log major minutes for you in every situation against the other team's best players. He's going to limit goals against, and he's going to score goals for the Rangers, or, contribute to goals. I have full confidence in Mika Zibanejad being able to replace Stepan's 55 points (like clockwork), where my hesitations lie will be in his on-ice impacts. Will he turn it around there? Playing a full season with Kreider will certainly help.

The Rangers will need Zibanejad to step up in a major way in terms of shot and goal suppression, as well as on the PK, if the envision him playing there. No denying that Arizona got one hell of a player in Derek Stepan, but the Rangers also got one in that trade with Ottawa last year.

Furthermore, the best thing you can do is continue comparing these guys. Use all the assets available to you. Draw your own conclusions.



Corey Sznajder Rangers data


Alain Vigneault, Turtling in the Third Period of Playoff Games, and Score Effects are Real

A big point of contention against AV hockey, especially in the playoffs, is that AV's Rangers tend to grab a lead, and then turtle. Turtle, of course meaning, going into a shell, and allowing a barrage of shot attempts against your goalie. This was highlighted no more in the most recent Rangers vs Ottawa playoff series in game 5. Jimmy Vesey would give the Rangers a 4-3 lead in the third period. After that, the Rangers would go no to be out-attempted by Ottawa 22-2, and lose in overtime.

Score-effects are a very real thing, and of course, there are adjustments that exist to take these into account. (Best reading is here from Michah Blake McCurdy). We know that trailing teams get more attempts than the teams that are leading them. It's just one of those hockey things.

With that, it shouldn't be surprising that the Rangers get out-attempted as they do after taking a lead in the third period. However, it does appear often watching these games, that the Rangers suffer from this at a more alarming event because of the way Vigneault chooses to defend his leads.

So, as we should with the things that we can, I set out to discover if the Rangers under Vigneault have been damaged more by these score-effects in the third period of playoff games versus their peers as a result of "AV hockey".


  • The data that was used in this analysis was pulled from playoff games from the 2014 playoffs through the 2016 playoffs. This year's playoffs are not included in this data set
  • The data was sourced from Corsica.Hockey's public RData files
  • The data focuses on:
    • All situations of play
    • The only data we are using in the analysis are play-by-play events that occur when the team in question has a lead in the third period at any time


  • All situations data here is a choice that was honestly made out of ease of use for me. 5v5 only might have been a better play here
  • Doesn't include this year's data
  • No study conducted on goals or blown leads
  • No time on ice, only a games played number
  • More that I'm sure I can't think of

With that said, I think most importantly is the high end number. This study features 7787 total shot attempts. Of the 7787 shots, 3051 were taken by the leading team, and 4736 were taken by the trailing team, for a leading team CF% of 39.2%.

There were 24 teams that had a lead in the third period of a playoff game from 2014-2016. Here's how they all shake out:

The Rangers end up as the 10th best team in terms of Relative CF% to the rest of the league (Team CF% minus League CF% not including that team). Of course, the -143 in terms of running differential is not the "sexiest" number in the world, but it comes with the territory of the amount of games played in the sample.

(Flaw: Again, better here would be to have TOI in these situations for each team so we could get a per 60 measure. Oh well.)

For aesthetics, here's a running differential visualization for the Rangers (each team in the study above can be found at this link



Are we too hard on AV for this specific item? Maybe. It's a league-wide phenomena. Score-effects are very, very, real. That didn't need to be proven again. But, the interesting take away here, at least for me, is that perhaps for this item, I've been a little too hard on AV. I think that this year's post-season would draw the Rangers a little further down that list, and perhaps this is worth re-visiting after compiling those numbers together when the post-season ends. Not there yet, though.

Post Mortem: Player Projection Model

Well, this was disappointing. The same model showed some pretty good success in predicting the 15-16 season, but totally botched the 16-17 season.

Running the model to the 15-16 season, predicted goals had an r^2 of 0.3037 and predicted primary points had an r^2 of 0.3421. In this iteration, 80 forwards had their goals predicted correctly, 77 forwards had their primary points predicted correctly, and 36 players had both their goals and primary points predicted correctly.

For this season though, well, the model essentially failed. With the same weighting parameters and system, this season didn't seem to cooperate. Goals had an r^2 of 0.1982, and primary points had an r^2 of 0.2952. Of the 298 players evaluated, the model hit on just 51 players for goals, 66 for primary points, and a measly 25 for both.

Issues with the model:

I think it's too lenient. The way the predictions work, if you can recall, is similar to 538's version of CARMELO. The issue, though, is that I don't think I'm getting enough separation to make each player unique. Each player, with the current weighting system, is coming away with too many comparable players. Throw all of these guys into the mix, and the  model will end up projecting very close to league average, even for the league's better players. 

For example, when running the projections for Sidney Crosby, 87 comes away with 144 player seasons with a positive similarity score. That's too many for Sid, who even I can admit, is in a league of his own. Now, a good sign is that no one has a similarity score over 57 (highest is Datsyuk's aged 29 season, a 97 point campaign for PD), but 144 players, even indexed based on their similarity score, is going to bring Crosby's projections down, and it does, predicting a high of 13.7 5v5 goals for Sid this year.

To counter this, I think I'm going to need to add more constraints to how the similarity scores are calculated.

But still, I think the power of this model comes in the way we can view a player's career trends, and not necessarily the actual numbers it produces. In the sense of, which players career goal and point totals are trending up based on similar players, and which players career goal and point totals are trending down based on similar players. Especially for upcoming free agents.

For example, a team may find themselves in the market for a UFA winger this year like Vanek or Stafford. Using these projections, they could get a baseline impression of how each players career will progress based on the previous careers of similar players...

(The first line in the below images is still the 16-17 season).



It's no surprise that the model is more optimistic in Vanek's future point totals, as Vanek has always been the more offensive player of the two, but, it is interesting that the model perceives a huge dropoff when Stafford hits age 34 that it doesn't have for Vanek. As always with signing players over 30 to long-term contracts, buyer-beware. 

Next Steps:

Continue to tinker. The model was only decent for the 15-16 season and was truly horrid for the 16-17 season. This probably means making the parameters more strict in order to get fewer and more accurate player projections. Which, I think means, adding more parameters.

I have plans on launching a Shiny app for the 17-18 season, but haven't gotten around to that yet.

True Shooting Percentage: How many shots does it take?

If you've ever heard me talk about hockey before, or have read this blog, or my Twitter feed, you know that I'm very skeptical of JT Miller.

via, JT Miller is shooting 12.12% this year during 5v5 play over 66 shots. Last season, JT Miller shot 14.78% during 5v5 play over 115 shots. Among players with at least 60 shots on goal this year, Miller ranks 46th in the league. Not so egregious. Last season, though, Miller ranked 9th in the league. Really, it was the explosion of goals last year for Miller that has caused me to dive into this, as 9th in the entire league does seem a bit high for JT. And it seems high, because last year's mark of 14.78% is a shooting percentage that JT didn't even touch when he was playing in the AHL. From the seasons 12-13 through 14-15, JT Miller took 235 shots in the AHL and scored on 29 of them, for an all situations shooting percentage of 12.34%

JT Miller's all-situations shooting percentage across last season and this season? 16.81% on 226 shots on goal.

The burning question remains, is JT Miller an elite shooter? Has his game evolved to that level? Or, is he getting lucky? Miller has 38 goals since 15-16, while his Corsica.Hockey expected goal total is a "mere" 23.17 goals, meaning his outpacing his expected total by 15 goals. In this sample (last two seasons), JT Miller is 13th in the league for goals scored above expected. The 12 names above him? Marchand, P. Kane, Tarasenko, Crosby, Burns, Ovechkin, Stamkos, Panarin, Hoffman, Kucherov, Weber, and Scheifele. I think even Rangers fans can admit that these are names that JT Miller likely does not belong to. Which is no shot at JT Miller the hockey player, but these are the NHL elite goal scorers.

When you couple together the names that JT exists among, and the comparison between his NHL and AHL shooting acumen, one has to wonder if we have enough data on JT to accept that this is his true shooting percentage talent. 

Thus, I've taken the following methodology to determine how many shots it takes for a player to reveal his true shooting percentage.

Stealing a page from baseball, as I often do, from this article which attempted to discover true numbers across varying metrics in baseball:

Following this, what I did was set a few benchmarks (500 shots on goal through 1500 shots on goal) and ran split-half correlations for each player that met these requirements on their odd games and even games shooting percentage. The author of the article above sought out a 0.7 correlation at a minimum to determine feasibility of finding truth in a metric, so I'll do the same. The results are as follows:

[data via custom query in Corsica.Hockey. The sample is only forwards, and 5v5 play to eliminate potential noise from a player receiving PP or PK time one year, but not the next. Data is from 2007-2008 through the 2015-16 season]

Since we were already at a sample size of just 50 at 1100 shots, I decided to just make the jump to 1500 as a maximum test, since we can't really discern any value from 8 players there anyway.

At no point after 500 shots on goal do we see a sample that produces a correlation of 0.7 of greater between odd and even game shooting percentage.

To round this back to the discussion on JT Miller, if we reduce the shot requirement to 225 (JT has 226 over the past two seasons) we get a correlation of 0.517 on a sample of 496 forwards.

This sort of goes against what we already believe. Where it's hard to imagine that we don't know a player's true shooting percentage after they've put 1000 shots on net in the NHL. And even after running this test, I'm still not certain that isn't true. But, I do think it is true that there doesn't seem to be this magic number of shots on net where we can definitively say that at that point, this player is a x% shooter. Well, at least not with this methodology.

Gains, Losses, and the Human Response to Risk

From Michael Lewis's new book, The Undoing Project (highly, highly recommend)

Imagine two scenarios with these choices:

Scenario 1: 
Choice 1: $500 in your pocket
Choice 2: 50% chance of $1000 and a 50% chance at $0

In this scenario, it's extremely likely that you are going to opt for choice 1. There is no risk, you get $500 and can walk away.

Scenario 2:
Choice 1: Lose $500
Choice 2: 50% chance of losing $1000 and 50% chance of losing $0

In this scenario, I bet that most people would opt for Choice 2. The risk of losing more is worth the potential chance to lose nothing.

People respond to risk very differently when it involves losses than when it involves gains. In a sense that with a gain, people are ready to take the sure thing even though they could've risked it for more, and in losses, people would rather gamble and potentially lose more for the chance to lose nothing.

How does this approach work with hockey? Well, take your pick of any coach around the NHL. It's highly likely that this coach has a player that he knows what he's going to get night in and night out. It might not be a very good player by objective standards, but again, the coach knows what he's getting. Now, consider the player who is scratched so this known commodity can play. The scratched player is an unknown. A risk of sorts. The coach doesn't know what he's going to get, so he is less inclined to play him.

Right here, we have the sure-fired $500 gain rather than the potential $1000 gain or risk of $0 gain.

If instead, these coaches flipped the script and found themselves losing $500 a night and they had a 50% chance of losing nothing in the pressbox every night, then maybe they'd be more inclined to flip the lineup and roll the dice.