More Stats about Magic and Sports

Posted on January 19, 2012

As anyone talking to me who doesn’t like basketball knows, I talk about basketball a lot. I was delighted to see Mike Flores make an extended analogy between basketball advanced stats and Magic theory and overjoyed when he chose a metric (Wins Produced) that summarizes so well what’s wrong with the way people try to conceptualize Magic.

In basketball, each team gets an equal number of possessions. On the offensive half, using stats to look at players is relatively easy—we can break down exactly how well the player shot the ball and from where, how good the player was at passing it leading to a score, how often the player turned the ball over, and so on. The other half of the game is a bit more problematic—numerically summarizing how well a player contests a shot without getting a block is nearly impossible. This is one of the issues addressed in Michael Lewis’s seminal article about Shane Battier.

The specific reasons that Wins Produced isn’t a good metric aren’t really relevant to Magic. It’s yet another linear-weights metric (just above every big-name blogger has his or her own) that claims to accurately show who the best players are, and no two agree on everything.¹ What’s relevant, though, is that it only gives weight to the things that we can already see—things we already keep track of (and maybe even spend too much time on already). By only looking at those things, it’s impossible for it to ever show us something we haven’t seen before—we only get out of it what we had already.

Fortunately, Mike Flores doesn’t attempt to connect Wins Produced to Magic by anything other than, “This is another way to think about basketball, so we should have another way to think about Magic.” Except that the way he proposes thinking about Magic (by looking at things from a mana-centric point of view) is one of the ways people look at Magic already. It’s not at all like possessions in basketball because you literally cannot produce anything in basketball without those. The true analogy in Magic would be looking at everything on a per-turn basis: How much did this game change over the past turn? Did it shift in balance from one player to another?

Ideally, each play would change the score in a game to show each player’s percentage of winning from that circumstance, assuming perfect play. That way, we could see exactly what cards affected the game the most and at what times. But, similar to how we can’t satisfactorily look at basketball defense, this kind of evaluation would require a sample size above and beyond any rigorous Magic-playing ever documented. It’s just impossible. So, there’s the issue with statistics-based Magic analysis: It’s impossible. So, that means sports-style advanced stats are just as impossible on the card level, and we need invented frameworks to tell us what cards are good.

The problem I have with Magic evaluation on the basis of mana—or anything that looks at things card-by-card (e.g. how good Blightning or Countersquall are relative to other cards)—is context. Cards operate so differently from deck to deck that they may as well be different cards entirely. What does Thoughtseize do? How good is it in Limited? Pretty bad, mostly, but you’d side it in if your opponent had some unbeatable bomb. That’s because, against a normal, aggressive Limited deck, it does such a different thing—instead of taking an unbeatable card, it just costs you 2 life and doesn’t set your opponent back. What about in Constructed formats? Say you’re playing a combo deck with Thoughtseize main. Game 1, it’s useful to stall your opponent enough that you can combo without any issue. Post-sideboarding, it does something different entirely: It sees what hate your opponent is bringing and neutralizes it (or tells you what the opponent has left over), letting you combo freely or wait until the time is better. And that’s just a spell you cast on your first turn.

It’s not just Thoughtseize obviously. Llanowar Elves is a different card in an aggressive, beatdown strategy than it is in a ramp strategy, and it’s an entirely different thing in Combo Elves. The examples are nearly infinite. But what’s the point of all this? If cards operate entirely differently depending on context and affect the board differently deck-to-deck, there’s no mana cost (or other) equivalent that will stay the same when looking at a single card. The conversion will be thrown off its axis depending on what deck it is. The only time the values will stay relatively the same is when piloting some run-of-the-mill, midrange deck that plays entirely fair the whole way (in other words, it only works if you’re playing a Mike Flores deck).

So if Wins Produced only tells us things we already know, and looking at things on a per-mana basis is relatively similar, what are the cards that won’t show up under this kind of analysis that are secretly really good? What are the Shane Battiers of Magic? While the Preordains and Ponders might not seem that great when lined up next to creatures that give a certain damage output per mana or removal that counteracts that, any card that reduces variance is going to be significantly better than it looks. This is why those cards (along with the similar-in-an-odd-way Green Sun's Zenith) were banned: They made games play out the same, which as a deck-builder, should be one of your goals. If everything plays out similarly when you draw those cards—if having Green Sun's Zenith in your opener leads you to keep more hands that otherwise are too lacking in [insert phase of game here]—what they’re really doing is adding free wins almost invisibly. If you can take fewer mulligans with only a marginally smaller win percentage when you don’t, you’ve eliminated a certain number of auto-losses across all your matchups. The cool and flashy card that everyone wants to play will change in one obscure matchup from 10% to 90%, but no one seems to care about the card that shifts every single Game 2% in your favor.

Theoretical systems for Magic can’t just summarize some cards in some decks. For them to be valid, they have to work for every game of Magic, and that includes combo decks that do completely bizarre things. The best way to do this is to zoom out the focus of Magic theory a bit—instead of focusing entirely on what happens with certain creatures, and the tempo they generate, and the removal the opponent plays (or his own creature), let’s look at the decks interacting in any one metagame. What Magic theory hasn’t told us, up to this point, is that the best answer to your opponent curving out is probably to cast Tendrils of Agony and kill him.

This is something that basketball has taught me as well: Individual player evaluations are subject to ebb and flow as they take on different roles, but a solid team philosophy, built on a few key pieces to implement it, can achieve remarkably consistent results despite high turnover of the lesser parts. Often, the micro view of two players going at it on either end will be subsumed by the clash of the greater team strategy.

Aren’t games fun?

Jesse Mason

@KillGoldfish

killingagoldfish.blogspot.com

¹ There are a lot of all-in-one player rating systems, and each of them has flaws. For once, I’ll use endnotes for their intended purpose and give some more information for those who really care.

John Hollinger’s Player Efficiency Rating (PER) is among the most useful, despite its misleading name; it penalizes for bad shooting . . . but very little. Similar to Wins Produced, it takes everything in a box score, throws it in a blender, and spits out a number (league average is 15). It’s a good way to see, at a glance, how much a player has produced offensively; however, it also includes steals and blocks due to pretending to cover everything. This is the prime example of the previously-mentioned linear-weights style of metric: Some dude with a computer invents numbers to summarize how much each thing in a box score is worth and adds them up.

Win Shares (very from Wins Produced) is built on ORTG and DRTG, two useful measurements with formulas like getting hit on the head by anvils. ORTG is usage-independent and speaks purely to how efficiently a player creates points (3-point specialists and low-usage bigs tend to do well in it), and is expressed in terms of points produced per hundred possessions used. DRTG, like most other approximations of individual defense, is rough, to say the least. It assumes that every player on the floor is equally part of the team’s defensive efficiency (e.g. points allowed per hundred), then subtracts the “stops” that player gets. Players who block shots or steal the ball will do well in it. The Battier disciples that contest without blocking are pretty much out of luck. Still, it’s as best an approximation of defense as anyone can come up with. Throw those two together, multiplying by usage and possessions, and it comes up with Win Shares for both offense and defense.

Wins Produced is basically PER with several shots of crazy and ego added for spice. It adds a bunch of things available in the box score, divides the Pythagorean wins up among players, and pretends that it actually approximates the wins that player has produced. Somewhere along the line, it also adjusts for player position by calculating the player’s production above the average at that position (i.e. shooting guards—who don’t do well in this calculation due to not rebounding,blocking shots, stealing the ball, or assisting—get a huge bump for playing at a level that would be well below average for a center). While the system does a good job of dumping on point-stuffers like Carmelo Anthony, it rewards rebounding and other box score things so heavily that Kevin Love had more Wins Produced last season than Lebron James. I’m sure that devotees of this system will smugly look at me as if I’m just trying to defend conventional wisdom and throw away anything that challenges my perceptions, but come on. Kevin Love is a player built specially to feast upon a system like this, often turning to box a player out when he could be contesting a shot, whereas Lebron’s tenacious defense goes wholly unrewarded. The system’s harsh punishment of inefficient shooting can go a bit too far as well—when a play has broken down, a team will often need someone to hoist up a contested jump shot, and that’s usually the team’s best player. The fact that a player shoots 40% on long 2s with a guy in his face and the buzzer sounding shouldn’t be held against him.

Plus/minus (+/−) is the most basic of the all-in-one stats, and it appears in box scores now. It’s straightforward: It’s the sum total of the points scored by the player’s team above (or below) the other team while the player was on the floor. Often, it’s used in comparison to off-court stats calculated the same way. While people will often dump on it and roll their eyes, it has its uses: It’s good at finding players who are doing things that don’t get captured by other metrics and at checking the results of other things. The hazards are the small sample sizes for bench players and the fact that it’s at the mercy of the other players on the court; starting players on bad teams will get pretty crappy +/− numbers due to facing against other teams’ starters that outmatch them, whereas some bench players will get star-caliber numbers due to bad opposition. Oklahoma City’s starters, for example, often look pretty bad, simply because James Harden comes off the bench to replace them and blows away the opposing scrubbos he’s faced up against. Where it’s useful, though, is in checking players like Kevin Love. So far this season, his team has done marginally better (negligible significance) offensively with him on the bench, and a mindblowing 15 points per 100 better defensively when he’s not in. Some of this is attributable to the fact that Rubio is a terrific perimeter defender who only recently became a starter, and some of it is due to the fact that Love is not a good defender.

So, why these long paragraphs down here on the page? Because no one system is perfect, and multiple statistics should be checked against each other (along with more basic statistics, like eFG%) to get a feel for how a player really performed. There is no wondrous, all-consuming formula in basketball, and there probably never will be. It doesn’t give me high hope for similar thinking to appear in Magic, a game that receives one-tenthousandth of the statistical attention.

TAGS articles, theory, analysis, jesse mason, statistics

More Stats about Magic and Sports

Recent Articles

The Best Heroic Interventions in Magic: The Gathering

The Best Characters with Rush in the One Piece TCG

The Lore of Avatar: Book 3, Fire - Part 2

Lorwyn Eclipsed Cards You Might Have Missed in White

This Week in Magic - Friday, February 13th

Mechanics of Magic: Awaken

The Ten Best Bears of All Time

Commander Kryptonite: How to Handle Fire Lord Zuko

Upgrading the Blight Curse Commander Deck

Ranking the Mythics of The Big Score