2016 has been a year of political upheaval. Tides of populism and nationalism across the Western world have swept Britain from the European Union; America into the arms of an outsider businessman, aiming to ‘drain the swamp’ of Washington corruption; and, most recently, ousted the Italian Prime Minister from his post, threatening economic and political stability throughout the EU.
Traditional political pollsters have been little help so far in predicting and explaining this phenomenon. The polls almost uniformly called Britain’s June referendum for Remain. Across the Pond, some polling aggregators had Hillary Clinton with a 99% chance of winning; even the most conservative aggregator, Nate Silver’s famous 538 crystal ball, had Clinton to win (probability of 70%).
But Qriously has been making strides in better poll-based forecasts. We started by successfully predicting Brexit (in fact, we were the only independent firm to publicly call the outcome for Brexit on the day of the election); and backed this up by predicting Trump victories in a number of key rust-belt states (including Michigan), and warned of his narrow but realistic path to victory. 2016 ended on a high for us, as we made our most accurate prediction yet: calling the Italian referendum for “No”, at 60-40, just days before the vote. (The final result was 59-41). While we were prevented from publishing this prediction due to the Italian 2-week poll-blackout, we provided this data, along with our simulations of the outcome, to a number of private clients (including hedge funds).
This blog post outlines how we did it, where we can improve, and where you can expect to see our predictions next.
Qriously conducted three surveys to measure Italian polling intentions: one from Nov 22 – Nov 24 (Wave 1); one from Nov 26 – Nov 28 (Wave 2); and one from Nov 30 – Dec 2 (Wave 3).
The actual referendum was held on Sunday December 4.
All polls involved interviewing over 3,500 Italians aged 18 or over, including over 2,000 likely voters.
This figure shows our prediction changing over time.
During our initial wave of data collection (Nov 22-24), we saw NO leading, although not conclusively; but during our second wave (Nov 26-28), we clearly saw NO pull away to a strong lead, which was then consolidated during the final wave (Nov 30 – Dec 2). The surge for NO coincided with real-life events: an estimated 50,000 people marched in Rome to protest against against Prime Minister Renzi’s proposed constitutional reforms.
While polls published prior to the blackout also showed NO ahead, none had NO leading by the extent that we saw it (even in our first poll), and certainly few would have predicted the landslide that NO eventually achieved.
Our Predictive Model
This figure shows the data we actually collected in our survey. We asked likely voters what they planned to vote, but as the graph above shows, there was a very large proportion of voters who were still undecided, with just days to go until the referendum; in fact, a third of voters would not commit to either side (compared to just 9-12% of voters who were undecided about their Brexit vote, and 16-20% of swing state voters in the US who hadn’t yet picked a side).
This presented a major challenge for us, and anyone else seeking to predict the outcome.
To solve this issue, we developed a model which aimed to assign each respondent – even undecided voters – a score, assessing how likely they were to vote ‘Yes’ and ‘No’. The model was data-based: we observed trends in the data associated with respondents who had given us a clear indication of whether they intended to vote YES or NO, and then examined whether our undecided voters showed any of the same patterns. For example, YES voters tended to support the incumbent Democratic Party and feel positively about Prime Minister Renzi; NO voters tended to support opposition parties, be more Eurosceptic, and express negative opinions about Prime Minister Renzi and his reforms.
Most undecided voters fell somewhere on the spectrum between a prototypical NO and a prototypical YES voter. We used our model to give each respondent a probability of voting NO or YES; and after reallocating each respondent into the appropriate category, we were able to generate the predictions shown above (including our final, successful 60-40 prediction). The model was highly accurate on our existing sample; when we back-tested it on respondents who had told us directly whether they planned to vote YES or NO, we achieved an 84% accuracy rate (i.e. we inferred the ‘correct’ vote state for 84% of respondents).
This was the first time Qriously has attempted to apply a data-driven model to predict the behavior of undecided voters, but given its success, we intend to repeat the procedure for all political polling in the future.
Areas of Improvement: Turnout
The figure above shows our final-wave results for the question we used to estimate turnout – ‘How likely are you to vote?’.
Typically, when political scientists attempt to calculate turnout from polling figures like this, they rely only on ‘very likely’ voters (assuming that anyone who says they are ‘somewhat likely’ to vote will not bother to turn out on the day). Even then, scientists often apply a reduction factor to this figure, assuming that a lot of ‘very likely’ voters will run into problems on polling day (e.g. long lines, family emergencies, getting stuck at work, and so on) which will prevent them from actually voting.
We did the same thing, and assumed the maximum turnout possible was 59% (the ‘very likely’ figure given above). This figure was our final turnout estimate. Of course, the actual turnout ended up being significantly higher (65%), eclipsing the number we gave.
The lesson for us here is two-fold: firstly, that turnout is very difficult to predict, and an engaged population will turn out to vote on issues that matter to them (even if previous referenda had much lower turnout); secondly, that we need to develop different turnout models in individual countries. While some countries, like the U.S., often present barriers to voting – such as a limited number of polling stations, resulting in long queues to vote – others make it easy.
This means that, in a country like Italy, almost all of the ‘very likely’ voters will actually vote – and a good proportion of the ‘somewhat likely’ voters will as well. In our future polling in European countries, we’ll take this lesson to heart and focus on including all likely voters in our turnout calculation.
Next Steps For Qriously
Given our success in recent political events, we fully intend to be at the front-and-centre of the political polling landscape in 2017. You can expect to see us covering the upcoming French, German and Dutch elections – all cases where the tide of populism may continue to rise – and we’ll be involved in several smaller political events as well (Presidential Election in Singapore, Constitutional Referendum in Turkey, and the General Election in New Zealand, for example). Stay tuned for more!
Dutch General Elections 2017: Our latest Poll With VVD to Win by a Significant Lead as PVV Loses MomentumRead More
The one critical skill many data scientists are missingRead More
How We Performed on Our Swing State ForecastsRead More
And the Winner of the 2016 US Presidential Election is…Read More
Samsung Recall: Negative short term impact on image and sales but situation not desperateRead More
Clinton Wins (Again), But Debate is Not a Game-ChangerRead More
Pokémon GO Is Not Just A FadRead More
How is Pokemon GO actually performing?Read More
EU Referendum Post-Mortem: Why Were the Polls So Wrong Again (and why our mobile polls got it right)?Read More