A Chess Formula Is Taking Over the World

In October 2003, Mark Zuckerberg created his first viral site: not Facebook, but FaceMash. Then a college freshman, he hacked into Harvard’s online dorm directories, gathered a massive collection of students’ headshots, and used them to create a website on which Harvard students could rate classmates by their attractiveness, literally and figuratively head-to-head. The site, a mean-spirited prank recounted in the opening scene of The Social Network, got so much traction so quickly that Harvard shut down his internet access within hours. The math that powered FaceMash—and, by extension, set Zuckerberg on the path to building the world’s dominant social-media empire—was reportedly, of all things, a formula for ranking chess players: the Elo system.

Fundamentally, what an Elo rating does is predict the outcome of chess matches by assigning every player a number that fluctuates based purely on performance. If you beat a slightly higher-ranked player, your rating goes up a little, but if you beat a much higher-ranked player, your rating goes up a lot (and theirs, conversely, goes down a lot). The higher the rating, the more matches you should win.

That is what Elo was designed for, at least. FaceMash and Zuckerberg aside, people have deployed Elo ratings for many sports—soccer, football, basketball—and for domains as varied as dating, finance, and primatology. If something can be turned into a competition, it has probably been Elo-ed. Somehow, a simple chess algorithm has become an all-purpose tool for rating everything. In other words, when it comes to the preferred way to rate things, Elo ratings have the highest Elo rating.

The simplest way to rank chess players, or players in any competitive game, really, is by wins and losses. But that metric is obviously flawed: For one thing, a mediocre player could amass an undefeated record by beating up on newbies while a grand master wins some and loses some against other grand masters. For another, a simple win-loss tally indicates more about how good a player has been than about how good a player is now. Even before Elo, chess had a rating system that was more complex than just wins and losses, but in the mid-1950s, a 13-year-old chess prodigy named Bobby Fischer broke it. He had gotten so good so fast that the rankings—which didn’t sufficiently account for the quality of a player’s opposition—couldn’t keep up. Apparently in response, the U.S. Chess Federation convened a committee to correct these deficiencies, and in 1960 adopted a system devised by a Hungarian American chess master and physics professor named Arpad Elo. The International Chess Federation followed suit a decade later.

More than 50 years later, Elo’s is still the go-to ranking system. It has been modified over time, and different chess governing bodies use slightly different versions (some, for example, are more or less “swingy” to wins and losses), but all of them are still close variations on the original. Elo has become the most important number in chess. “Whenever anyone finds out you play chess, the immediate question is always, ‘What’s your rating?’” Nate Solon, a chess master and data scientist who writes a weekly chess newsletter, told me. The Elo system has been modified over time, and different governing bodies use slightly different versions, but all of them are still close variations on the original.

But Elo ratings don’t inherently have anything to do with chess. They’re based on a simple mathematical formula that works just as well for any one-on-one, zero-sum competition—which is to say, pretty much all sports. In 1997, a statistician named Bob Runyan adapted the formula to rank national soccer teams—a project so successful that FIFA eventually adopted an Elo system for its official rankings. Not long after, the statistician Jeff Sagarin applied Elo to rank NFL teams outside their official league standings. Things really took off when the new ESPN-owned version of Nate Silver’s 538 launched in 2014 and began making Elo ratings for many different sports. Some sports proved trickier than others. NBA basketball in particular exposed some of the system’s shortcomings, Neil Paine, a stats-focused sportswriter who used to work at 538, told me. It consistently underrated heavyweight teams, for example, in large part because it struggled to account for the meaninglessness of much of the regular season and the fact that either team might not be trying all that hard to win a given game. The system assumed uniform motivation across every team and every game.

Pretty much anything, it turns out, can be framed as a one-on-one, zero-sum game. You may well have been evaluated by an Elo rating without even knowing it. Elo ratings can be used to grade student assessments and inspect fabric. They can be used to rank venture-capital firms and prioritize different kinds of health-care training. Until a few years ago, Tinder used Elo scores to rate users by desirability and show them potential matches with similar ratings. Computer scientists have begun keeping an Elo-based leaderboard of large language models. Primatologists use Elo ratings to model social-dominance behaviors. At least one person has used them to decide which of their T-shirts to chuck.

The allure of Elo is clear: People are obsessed with data and statistics and ranking things, and Elo provides a sense of quantitative rigor, of objective meritocracy. “The good thing about it in chess is that you have this single number that captures your ability pretty accurately,” Solon told me. Of course on some level you’d want something similar in other aspects of life. “But then the dark side of that is that it can determine your standing within the chess world and even your self-worth … It’s sort of a curse for a lot of players because they’re just fixated on that number.” The great thing about Elo ratings is that you know exactly where you stand relative to everyone else, and the terrible thing about Elo ratings is that you know exactly where you stand relative to everyone else.

In truth, though, Elo doesn’t guarantee anything. The rankings are only as good or meritocratic as the underlying competitions. There’s nothing magic about them: However sophisticated your formula, if your inputs are junk, your outputs will be too. Last summer, someone built a website called Elo Everything, which does exactly what you’d think it would. When you visit the site, it serves up two things and asks, “Which do you rank higher?” A few example face-offs include the U.S. government versus spiders, testosterone versus crispiness, and the One Ring from Lord of the Rings versus the death of Adolf Hitler. Your selection affects the Elo score of the two things in contention, and that in turn affects the overall leaderboard. Currently atop the standings are: (1) The universe, (2) water, (3) knowledge, (4) information, and (5) love. Language, matter, and the “female body shape” were, as of this afternoon, locked in a three-way tie for 24th.

Elo himself understood the limitations of his invention. In his conception, its function was quite narrow: “It is a measuring tool, not a device of reward or punishment,” he once remarked. “It is a means to compare performances, assess relative strength, not a carrot waved before a rabbit, or a piece of candy given to a child for good behavior.” Inevitably, that is what it has become.

The Atlantic

Leave a Reply