By David Roher
In case you haven’t been hanging around the benighted corners of the political internet lately, there’s an idiotic backlash afoot against Nate Silver, the proprietor of the FiveThirtyEight blog who made his name as one of the sharpest baseball analysts around.
With the election just a few days away, analysis based on state poll aggregation—Silver’s included—suggests that Barack Obama is a heavy favorite against Mitt Romney. The president holds a slight but strong lead in key electoral states. This doesn’t sit well with many political pundits, who insist that the outcome is anyone’s guess and headed down to the wire. Many of these people have directed their anger toward Silver, whose New York Times-hosted blog has predicted a strong probability of an Obama victory since June. They insist he is biased or sloppy in his methodology, even though they seem unaware of how he makes his predictions and of statistical analysis in general. They say—and I’m not kidding—he’s too gay for this sort of work.
In retrospect, we should’ve seen it coming. It was only a matter of time before the war on expertise spilled over into the cells of Nate Silver’s spreadsheets. In fact, in some ways it had already. Turns out that nothing could have prepared Silver better for the slings and arrows of a surly and willfully obtuse pundit class than working on the fringes of sportswriting over the past decade.
Silver became well-known among baseball statisticians in 2003, when he debuted his PECOTA projection system on Baseball Prospectus. While other projection systems already existed, PECOTA had two key strengths: It based its player projections on the past performances of similar players, and it gave a probability distribution for each player to reflect the uncertainty of the measurement. For example, if a hitter battled an injury through the season and his production suffered, he’d hit his 20th-percentile PECOTA projection rather than miss the mark entirely. Some players are harder to project than others, and Silver’s methodology let the reader in on his relative confidence. None of this was unique to baseball; scouts have always relied on player archetypes, and front offices have always understood that production tends to fall within an expected range rather than on a straight line. All Silver was doing was taking that analysis out of the realm of the gut.
If he had debuted his system a few years earlier, only the nerds would have noticed. However, the debut coincided with the release of Michael Lewis’s Moneyball, which detailed the quantitative exploits of the Oakland Athletics front office to a wide audience. Bill James had been developing advanced baseball statistics, or “sabermetrics,” for nearly three decades before Lewis’ book, but 2003 was the first time his work made it to every front office in Major League Baseball.
Moneyball exaggerated the contrast between stats and scouts for dramatic and comic effect—no team has ever or since won with the numbers alone, the A’s included. But many baseball lifers rallied against the entire sabermetric movement as a result of this misconception. The most famous example was Joe Morgan, a Hall of Famer-turned-ESPN broadcaster who hated the idea that someone like Nate Silver had the gall to predict a player’s performance.
(The funny part is, Morgan might have been the most sabermetrically inclined player ever. The inferior statistics of the time, like batting average and RBIs, didn’t capture what made him so valuable, but he drew over 1800 walks, stole 689 bases at an 81 percent success rate, and was an excellent fielder at a premium position. James called him a "perfect" baseball player. In a lot of ways, his greatness as a ballplayer wasn’t appreciated until the numbers crowd began loudly singing his praises.)
As sabermetrics continued to gain popularity among officials and fans, many sportswriters felt threatened by systems like Silver’s. The beauty of the sport lies in its unpredictability, and any attempt to quantify the future seemed both arrogant and dangerous. They often ridiculed PECOTA’s projections, particularly the ones they felt went against common sense, like this one from 2007:
After winning a World Series and more games the last two seasons than any team in baseball except the New York Yankees, the White Sox should have earned a little respect.
Well, maybe from real baseball people, but not in the surreal world of computers.
Baseball Prospectus, considered the new-age statistical bible, projects the White Sox to finish with a 72-90 record this season.
"Well, we’re screwed now," team captain Paul Konerko said with a sarcastic laugh. "I guess we’ll just have to battle through."
The White Sox finished with a 72-90 record that season.
Still, Silver and PECOTA have been wrong thousands of times, often spectacularly so. He didn’t, and never intended to, make baseball more predictable. He had a coin flip’s chance of guessing the outcome of a single pitch or game. However, by adding up all those coin flips, he was able to see macroscopic patterns emerge out of microscopic randomness. What made his predictions so much better than a hack’s idle speculation was that they looked better as a whole. There is no human bias in PECOTA, and most of its error is due to random variance.
One of the most important concepts in statistics is the Central Limit Theorem, which states in part that as the number of observations increases, the random noise is more likely to cancel itself out. For instance, there’s an excellent chance that if you recorded the heights of the next 30 people you saw, the distribution would follow a bell curve, with the average height posting the highest frequency. So, while Silver might have missed one player’s home run total by 50 percent, he might have missed that player’s team’s home run total by only 5 percent.
PECOTA wasn’t any sort of dark magic. PECOTA was in essence a smart analyst who knows his history, draws defensible conclusions, chips away at uncertainty, and never claims to know more than he actually does. A good pundit, in other words. And that’s what made it and its ilk such a threat to the sports world’s pundit class. Statistics can’t replace good writing, but it can expose the bad, and sabermetrics represented a direct threat to the bad writers who had gotten away with being bad for far too long. These were the writers who used the same old false narratives to reach the same old misguided conclusions. (They used stats, too, incidentally, just the wrong stats—the noisy old metrics like RBIs and batting average.) Sportswriting isn’t a monolith, and many writers like Joe Posnanski combined their experience and access with the new methodology. But a lot of those pundits made their money on that margin of uncertainty in sports, yammering about heart and grit and all that ineffable crap that was never so ineffable that a hack couldn’t write 500 words about it for the early edition. And so they remained in the dark, stubbornly entrenched, missing out on a new way to analyze the game they were paid to follow.
In 2008, Nate Silver shifted his focus from baseball to politics. He began blogging about the presidential race, and eventually started his own site, FiveThirtyEight. Unlike most political writing, Silver’s was objective and rigorous: He used the same skills he honed with PECOTA to make an election forecasting system. Where PECOTA used players’ similarity to boost the sample size, FiveThirtyEight used states’ similarity from demographic data to fill in the blanks when polling data was sparse. He also introduced measures such as poll weights based on party bias in a particular election cycle, sample size, and time lapsed.
His debut was solid: Silver’s model missed only one state in Obama’s victory, and it correctly predicted every Senate race. His results greatly increased his popularity in the news media, and two years later his blog was acquired by The New York Times. Silver fared well in the midterm elections too, forecasting six new Senate seats for the Republicans (they got seven) and 54 new seats in the House (they got 62). He also correctly predicted 36 out of 37 gubernatorial races. However, in his experimental attempt to predict the 2010 UK General Election, he predicted that the Liberal Democrats would double their seat total to 120 when it actually shrunk by 5 to 57.
His predictions have gotten more attention in 2012 than they had in previous elections. He started forecasting the presidential election in June, and he’s rated Barack Obama’s chances at greater than 60 percent since then. As Election Day approaches, we’ve started to see the same criticisms of his work that sportswriters offered up years earlier. Last week, his colleague David Brooks wrote critically (if whimsically) about poll data:
If there’s one thing we know, it’s that even experts with fancy computer models are terrible at predicting human behavior. Financial firms with zillions of dollars have spent decades trying to create models that will help them pick stocks, and they have gloriously failed.
Brooks’s criticism was general, against his own impulse to put each individual poll under a microscope. But other detractors are focusing on Silver himself, particularly conservative writers and pundits who refuse to accept that Mitt Romney’s election is unlikely. The most laughable of these is Dean Chambers, who forecasts the election on UnSkewed Polls by adjusting each poll so that Romney is in the lead. Chambers thinks that Silver’s slight stature is evidence of his bias:
Nate Silver is a man of very small stature, a thin and effeminate man with a soft-sounding voice that sounds almost exactly like the “Mr. New Castrati” voice used by Rush Limbaugh on his program. In fact, Silver could easily be the poster child for the New Castrati in both image and sound.
Chambers’s gay-baiting is just a more transparent version of the nerd-baiting favored by the likes of Murray Chass and Dan Shaughnessy, who never tire of ridiculing sabermetricians as limp-wristed geeks living in their mother’s basements. Most of the other conservative criticism of Silver isn’t as extreme or fraudulent, but it demonstrates the same bias. The National Review's Josh Jordan tried to nitpick “Nate Silver's Flawed Model:”
While there is nothing wrong with trying to make sense of the polls, it should be noted that Nate Silver is openly rooting for Obama, and it shows in the way he forecasts the election.
On September 30, leading into the debates, Silver gave Obama an 85 percent chance and predicted an Electoral College count of 320–218. Today, the margins have narrowed — but Silver still gives Obama a 67 percent chance and an Electoral College lead of 288–250, which has led many to wonder if he has observed the same movement to Romney over the past three weeks as everyone else has. Given the fact that an incumbent president is stuck at 47 percent nationwide, the odds might not be in Obama’s favor, and they certainly aren’t in his favor by a 67–33 margin.
Jordan uses words like “fact” and “certainly” to say that Silver is biased and too sure of himself. He appears to be confusing Obama’s national poll numbers with the probability that he’ll win the election, two figures that aren’t even in the same units. Later, he cherry-picks poll weights to prove that Silver is cherry-picking poll weights:
But look at some of the weights applied to the individual polls in Silver’s model. The most current Public Policy Polling survey, released Saturday, has Obama up only one point, 49–48. That poll is given a weighting under Silver’s model of .95201. The PPP poll taken last weekend had Obama up five, 51–46. This poll is a week older but has a weighting of 1.15569.
Josh Jordan wants to see Nate Silver’s bias so badly that he projects it onto every datum. The FiveThirtyEight model sets its weights algorithmically without regard to individual results, and this particular weight differential was more likely due to the difference between the two polls’ sample size.
Jordan doesn’t seem like much of an authority on sampling; you wonder why he doesn’t just adopt a professional’s line of criticism. Sam Wang, a biophysicist and neuroscientist, runs the criminally underrated Princeton Election Consortium. Wang’s model nailed the electoral vote count in 2004, and missed it in 2008 by just one, making his track record even better than Silver’s. Here’s his criticism of FiveThirtyEight:
Basically, I think he does messy analysis that throws out information, adds unnecessary factors, with effects such as blurring the odds as well as time resolution.
However, the bottom line for you is this. For purposes of public consumption he is fine. He has good intuitions and is more concerned with getting things right than with anything else.
Wang’s model also uses no poll weighting, which seems to be Jordan’s beef with Silver. So why wouldn’t Jordan cite Wang in his criticism? It might have to do with Wang’s forecasting Obama’s re-election chances at 98 percent. He’s over 10 times as confident as FiveThirtyEight.
Noticing that nearly every poll aggregator forecasts an Obama victory, some of Jordan’s peers have become even more adamant that any poll-based prediction is a fool’s errand:
@jaycosttws when do we break it to them that averaging polls is junk?— Jennifer Rubin (@JRubinBlogger) October 29, 2012
Unless, as economist Justin Wolfers suggested, the Washington Post has found a way around the Central Limit Theorem, poll averaging isn’t the thing that’s junk here.
Meanwhile, MSNBC host Joe Scarborough slammed Silver’s projection simply for not being 50-50:
"Nate Silver says this is a 73.6 percent chance that the president is going to win? Nobody in that campaign thinks they have a 73 percent chance — they think they have a 50.1 percent chance of winning. And you talk to the Romney people, it’s the same thing," Scarborough said. "Both sides understand that it is close, and it could go either way. And anybody that thinks that this race is anything but a tossup right now is such an ideologue, they should be kept away from typewriters, computers, laptops and microphones for the next 10 days, because they’re jokes."
Scarborough’s comments illustrate the central and most pernicious bias in political media: not toward one candidate or another, but toward a toss-up. Forecasters like Silver and Wang strive for precision in addition to accuracy. If accuracy is how close the average dart is to the bullseye, precision is how close each dart was to the others. We don’t yet know whether they’ve been accurate, but we can already safely say that they’ve been precise, as their predictions heading into November are essentially the same as they were months ago.
The political media hate precision: No one tunes in to a boring horse race. The volatility of day-to-day polling allows them to explain how the contest (in which, till recently, no actual votes had yet been cast) has been lost and won and lost again with each news cycle—an endless series of decisive revelations and foundational truths about the candidates or the public. If the narrative had followed Silver’s and Wang’s graphs, there would have been little to no hubbub over Bain’s outsourcing, “You didn’t build that,” the 47 percent, or the first debate. And what fun would that be? Both the Romney and Obama camps are happy to play into the toss-up narrative, as Obama needs his presumed majority to actually go to the polls on election day, and Romney wants to give his base confidence and hope. It’s the rare thing that everyone can agree on this year. (That and coal. Everybody fucking loves coal.)
The baseless criticisms also illustrate how many political pundits proudly display their quantitative ignorance. In the Scarborough article, Politico's Dylan Byers offered this breathtaking analysis:
What matters for Silver is that the president wins and that he ends up with a total number of electoral votes somewhere in the ballpark of whatever Silver predicts on the afternoon of Nov. 6. And even then, you won’t know if he actually had a 50.1 percent chance or a 74.6 percent chance of getting there.
Maybe you won’t. Forecasts should be judged on their processes, not their results. If Mitt Romney wins the election, Silver won’t necessarily have been wrong (for one thing, he’ll look a lot better than Wang). He’ll only merit criticism if the results show that he made improper inferences, such as a skewed voter turnout model or a flawed weighting algorithm.
Most of the irrational Silver criticism has come from the right, but so has a lot of the rational criticism. Conservatives been among the minority to point out the potential flaws underlying Silver’s methods and assumptions. At the Daily Caller, Sean Davis created a lightweight version of Silver’s model in one day, which closely matches FiveThirtyEight’s predictions. Davis doesn’t think Silver has a liberal bias—rather, he believes that his and Wang’s model rely too heavily on, and don’t add a significant amount of information to, state polling averages. Davis and pundits like Jay Cost are making quantitative arguments about the methods of the polling companies themselves. This election might be a referendum on the predictive value of state polling averages.
In fact, we’ve reached the point in our screwed-up political media culture where the polling companies and forecasters—not the pundits, not the spokespeople, and certainly not the candidates—are the only people being evaluated rigorously on the substance of their arguments. If Nate Silver and Sam Wang screw up, their popularity will suffer as a result, and they’ll have to reconsider their models. Meanwhile, if Brooks, Jordan, Scarborough, Rubin, or Byers make another poor argument, they’ll continue to collect their paychecks as if nothing had happened. Likewise, the Curse of the Bambino stopped working long ago, and yet Dan Shaughnessy is still getting book deals.
Just like their colleagues in the sports section, the political pundits see the wrong kind of uncertainty in Nate Silver. They associate statistics with mathematical proof, as if a confidence interval were the same thing as the Pythagorean Theorem. Silver isn’t more sure of himself than his detractors, but he’s more rigorous about demonstrating his uncertainty. He’s bad news for the worst members of the punditry, who obscure the truth so their own ignorance looks better by comparison and who make their money on the margin of uncertainty, too.
As successful as Silver has been in both sports and politics, his greatest achievement so far might be attracting this particular flavor of criticism. As Wang said, his methodology is up for debate, but his goals of accuracy and precision, and the general science of forecasting, aren’t. The hardest thing Silver works at is eliminating bias, so when baseball fans accuse him of being biased against their team or political lackeys accuse him of being biased against their candidate, it’s the accusers themselves who are revealing their own insurmountable bias. In their attempt to put Silver on trial, they only convict themselves.