Published on 12:00 AM, November 14, 2016

Tech Feature

How data-science rocked the US elections

11/9. It was an average morning. Most Bangladeshis woke up getting ready for the busy day ahead. And that is when the news broke. Donald J. Trump, the presidential candidate of the Republican Party, was leading the polls. Slowly and steadily, as they day went by, it was increasingly clear that he is going to be the next leader of the free world (!). Countering all major forecasts, Trump aced in states that most media thought would be inclined towards Hilary. So, what went wrong? Did the number crunchers get the data wrong? Or was it there all along… but they failed to see it.

Nowadays in developed countries, elections are a field day for data science. Although, election data science remains a novice wing of hardcore-big-data-fuelled science, it is still extensively being used to stay ahead in the game. 

How does data science work in elections? It's quite straightforward actually! Pollsters, over a period of time, collect data via phone calls, social media, surveys and even social big data from third parties. After collecting reams of data, these are fed into machines that try to understand the context and provide patterns and critical insights. 

Since the very first campaigns by Senator Clinton and Donald Trump, data scientists have been crunching numbers based on public opinion, media coverage, scoops and scandals. Till the very last week, most of the significant news outlets i.e. CNN, New York Times, MSNBC, found that Senator Clinton's chances of winning the elections were 70-95%. But the whole world was shocked when the final polling results emerged. So, what went wrong?

Truth be told, the election data science comes with its own trade off—after all it's a numbers game. Most people actually fail to understand that data science is merely a blunt tool. The raw data always hides the truth, but it is the missing context that makes the data unanalysable. And, there are countless real world examples of data science failing. Google's flu engine was a machine that overinflated the number of flu patients in USA by not understanding the concept of patient zero. Facebook was, in the recent past, heavily criticised for taking down the picture of a 9 year of naked girl running away from napalm bombs. Facebook's algorithm flagged it as an inappropriate content instead of comprehending the underlying message—the fury of the Vietnam War. 

Another reason that data scientists from Silicon Valley have pointed towards is: insufficient data on US elections. Out of all the data that pollsters collect during the election campaigns, only a very few data are truly actionable and reliable. Like any scientific experiment, when a new variable is introduced the equation changes. In the US elections, we saw new variables being dumped every now and then. Be it the email servers of Senator Clinton or the Trump University Scandal or Hillary Clinton being sick during her campaigns or Melania Trump's GQ Magazine photo shoot—every new scoop actually sensitised the voters in various ways. According to Hillary Clinton's campaign manager, the fatal blow to the campaign was the announcement made by James Comey, Director of FBI, on Election Day regarding the private email servers that the FBI was investigating.