Big Data

2016 Presidential Election: Did Big Data Just Get Lazy?

Bill Schmarzo By Bill Schmarzo CTO, Dell EMC Services (aka “Dean of Big Data”) November 14, 2016

“Oh, somewhere in this favoured land the sun is shining bright,

The band is playing somewhere, and somewhere hearts are light;

And somewhere men are laughing, and somewhere children shout,

But there is no joy in Mudville—mighty Casey has struck out.”

– Casey At The Bat

Mighty Casey Has Struck Out

Nate Silver, the face of big data for many of us, was (and still is) a living legend in the world of big data.  He almost single-handedly made big data relevant by putting big data into our daily conversations.  For the past two Presidential Elections (2008 and 2012), not only did he correctly predict which candidate would win the election, but he also predicted correctly the results of all 50 states.  Perfection.  He truly was Mighty Casey.

Unfortunately, Nate didn’t fare so well this past Presidential Election.  Not only did Nate miss predicting the winner of the Presidential Election, but he also missed on 9 of the states in this year’s Presidential Election.  And he didn’t just miss the swing states; he also missed states that he predicted to be “solidly” in Clinton’s camp.  In fact, on the morning of the Presidential Election, Nate gave Hillary Clinton a 95.4% chance of victory with the below state-by-state predictions (see Figure 1).

2016 presidential election

Figure 1:  Nate Silver’s Presidential Predictions the Day of the Election (Nov 8)

Note: I apologize but Figure 1 should be attributed to August 8, not November 8.  Please see the notes below for more details correcting my error.

As is stated in any mutual fund disclaimer, “Past performance does not guarantee future results.”  And unfortunately, Nate’s past success did not guarantee success in this election as we look at the final results of the 2016 Presidential Election (see Figure 2).

2016 presidential election

Figure 2:  Actual Presidential Election Results

As Yogi Berra famously said about making predictions:

“’It’s tough to make predictions, especially about the future’”

And Nate missed his Presidential Election predictions on 9 states:

  • Florida
  • Georgia
  • Pennsylvania
  • Ohio
  • North Carolina
  • Michigan
  • Iowa
  • Wisconsin
  • Arizona

That would be like Stephen Curry, Golden State’s outstanding professional basketball player, making 50 out of 50 free throws two games in a row, and the suddenly missing 9 free throws in his next game.  We’d be stunned because we had gotten use to perfection in an art (developing predictions) that is fraught with failures.

But it wasn’t just Nate Silver and his company FiveThirtyEight that got it wrong.  Even Trump’s data science team (Cambridge Analytica) on the day before the election, didn’t think that Trump was going to win…

On Monday [the day before the election], [Trump’s data science team from] Cambridge Analytica gave Mr. Trump less than a 30% chance of winning.”

Yeah, making predictions about the future is very tough.  And while I cannot be certain as to exactly what happened to Nate’s predictions, I have a theory.

Moneyball: Analytics Half-life

My favorite data science book probably is not even considered a data science book by most data scientists.  I recommend to my University of San Francisco students that if they want to understand the basics of data science, that they read the book “Moneyball: The Art Of Winning An Unfair Game.”  I think that book does a great job of explaining some basic data science concepts, such as explaining that data science is about identifying those variables and metrics that might be better predictors of performance.

Billy Beane (author of the “Moneyball” concept) gave the Oakland A’s a winning formula by leveraging data science to build a more cost effective, high performing baseball franchise.  As a result, the Oakland A’s consistently outperformed better financed competitors (like the New York Yankees) in reducing “costs per win” while continuing to be competitive on the playing field (see Figure 3).

2016 presidential election

Figure 3:  Power of Moneyball

Unfortunately, the competitive advantages and market differentiation offered by analytics are fleeting; they have a “half-life” (see Figure 4).

2016 presidential election

Figure 4:  Half-life Calculation (Wikipedia)

Other baseball teams quickly copied the Moneyball analytic concepts and the Oakland A’s soon lost their competitive advantage (as Theo Epstein took what he learned from the Oakland A’s to help the Boston Red Sox to win their first World Series title in 86 years in 2004, and then subsequently help the Chicago Cubs to win their first World Series title in 108 years this year!).

Presidential Election 2016: Did The Data Scientists Get Lazy?

Predictive models that work so well today, over time will lose their effectiveness because the world is constantly changing.  The Presidential Election predictive models that worked so well in 2008 and 2012 eventually lose their edge because sentiments (desires, tendencies, inclinations, behaviors, interests, passions, affiliations, associations etc.) are constantly changing caused by widening wealth gaps, corporate acts of malfeasance, government scandals, rising burden of student debt, growing ranks of the under-employed, increasing workers who have stopped looking for work, wage stagnation, mounting populace divide on social media, legalization of marijuana, rising media sensationalism, climate change, terrorism, Cubs win the World Series, etc.

Consequently, analytic models need to constantly explore new data sources, new proxies, new data enrichment techniques, and new analytic algorithms in looking for those variables and metrics that might be better predictors of voter sentiment.

For example, one of our data scientists in Miami (Luciano Tozato) performed video analytics to count the size of the crowds at local Trump and Clinton rallies.  He discovered consistent under-counting of the Trump’s crowds and over-counting of Clinton’s crowds (likely because the local media was too lazy to do a more accurate count).  And while this variable alone won’t predict the winner, it’s one of many new variables that the data scientists might want to consider when trying to predict voter sentiment (see Figure 4).

2016 presidential election

Figure 5:  Using Video Analytics to Count Size of Crowds

Predicting Not Enough; Must Also Prescribe

While predicting how the election is going to turn out is interesting, ultimately the political parties needed to turn those predictions into prescriptive actions.  For example, while it’s useful to predict which constituents are the swing voters in the important swing states, it’s more important to be able to prescribe (and test and refine) actions, messages, venues and channels to reach and persuade those swing voters.

For example, “Predicting Voter Turnout by County by State” (there are about 3,900 counties in the United States) would be a critical use case for any successful campaign. To create this prediction, the data scientist would start with descriptive analytics to create a historical baseline on what has happened in the past:

“How many voters voted in each county in the past election?”

To answer this question, another of our data scientists Anil Inamdar (covered in Anil’s blog “This is how Trump won: Enthusiasm of Luddites vs. Complacency of Democrats”) did some research and came across the following insight:

Wayne County in Michigan had 78,824 fewer voters registered as Democrat in 2016 when compared to 2012

Democrat Presidential candidate Hillary Clinton lost Michigan by 12,000 votes (see Table 1).

2016 presidential election

Table 1: Analysis of Voter Turnout in Wayne County in Michigan

The question that the Democrat team should have been asking is:

“How many registered Democrats are going to vote by county in this election?” 

Or maybe an even more focused question:

“How many registered Democrats are going to vote in the ‘most important’ counties in each of the key swing states in this election?” 

Once the Democrat team had that prediction at the county/state level, then the campaign should have developed prescriptive analytics to tell the campaign what to do to increase Democrat voter turnout:  Who to target?  What message to target them with?  How best to reach them?   Which newspapers and TV channels to advertise?  When to advertise?  How much to spend on advertising?  When to reach them?  How often to reach them?  What venues to hold rallies?  What relationships, associations or affiliations to leverage to reach them?

2016 Presidential Election Big Data Lessons

So what are the take-aways from the 2016 Presidential election?

  • The effectiveness of analytic models decay over time – they have experience a “half-life” – because the world is constantly changing. Competitive advantage and differentiation created by analytics is fleeting, easily copied by others.  So organizations need to treat their analytic assets as living creatures, in constant need of nurturing, care and refinement.
  • Analytic models need to constantly explore new data sources, new proxies, new data enrichment techniques, and new analytic algorithms in looking for those variables and metrics that might better predictors of populace sentiment.
  • Start by understanding your important use cases (Predicting Voter Turnout by County) and then transition the organization from the descriptive questions about what happened (which forms the historical baseline) to the predictive questions about what is likely to happen, and the prescriptive questions about what the organization needs to be successful with the use case

Otherwise you might end up living in Mudville with the Mighty Casey.

[1] For 2016 results: http://www.nytimes.com/elections/results/president. For 2012 results: https://en.wikipedia.org/wiki/United_States_presidential_election_in_Michigan,_2012

Bill Schmarzo

About Bill Schmarzo


CTO, Dell EMC Services (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Read More

Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

One thought on “2016 Presidential Election: Did Big Data Just Get Lazy?

  1. The FiveThirtyEight election forecast was frozen on the morning of November 8, and it gave Clinton a 71.4% chance of victory (https://projects.fivethirtyeight.com/2016-election-forecast/). Georgia, Iowa, and Ohio were predicted correctly in the final forecast.

    The photo above is from the “now-cast” from August 8 (https://www.reddit.com/r/EnoughTrumpSpam/comments/4wrxpj/fivethirtyeight_nowcast_sinks_trump_chance_of/). The now-cast was always designed to be much more volatile; neither the polls-only nor the polls-plus forecasts ever gave Clinton more than an 88% chance of victory.

    • Chris, thanks for clarifying and you are absolutely correct. The iPhone cache monster got me and I incorrectly associated that image with November 8 when it was in reality August 8. Thanks for taking the time to correct and give us the facts. I certainly do not want to be accused of posting fake news!