Seriously Flocked Up \[C+M]
After a long hiatus we are returning to the blog! - this time we will look at the credibility of the polling data on the run-up to the last UK general election (2015).
Although on the run-up to the general election it was widely reported that the outcome was "the most unpredictable in a generation", this doesn't mean that the pollsters were uncertain as to the proportions of people voting one way or another - but instead was related to the fact that widely differing outcomes could result when those votes were translated into constituency seats. In fact, as we will argue in this blog post, the pollsters were largely in agreement with one another and were actually remarkably sure of the result. Of course, they were horrendously wrong, and in this article we will put forward some arguments as to why they could be so sure of the outcome, yet so wrong.
Pollsters aren't governmental organisations, or charities, or altruistic societies with a keen interest in political outcomes. Instead, they are companies who may be commissioned by partisan newspapers trying to write a story, or commissioned by third parties with a vested interest in the outcome (for instance political parties), or occasionally polls are being used by pollsters as a marketing tool by which they can raise their profile in order to garner future business. Indeed, they are self-admittedly selective in which polls they do or do not publish and will give advanced access to results of those commissioning them. As a result what we observe as the general public is a delayed, warped and potentially biased reality. They are not even neutral in their impact - their portrayal of reality changes public perception and can alter the focus of debate leading up to polling day.
So, is there any quantitative evidence of manipulation (in addition to the qualitative evidence above)? To study this we will look at the raw data publicly available prior to the election (available as an Excel document linked from this blog, and excluding large unreliable online SurveyMonkey polls and the retrospective Survation poll) and will focus on the Conservative / Labour polling split (the two main political parties in the UK).
In the wake of the disastrous prediction, the British Polling Council (BPC - an association of market research companies, NOT an autonomous organisation) commissioned a report by Patrick Sturgis (Southampton) into what went wrong. In an interesting Guardian article some of the primary possible explanations put forward by the polling agencies themselves are explored (including variable turn-outs of different voting groups, busy voters who were more likely to vote one way), but Sturgis remarks that "it only accounted for a small amount of the total polling error". Instead the “surprising feature of the 2015 election” was “the lack of variability across the polls” - but what does that mean? What causes lack of variability? How can we assess that?
A convincing explanation is 'herding’: a process by which pollsters ensure their polls are in line with one another. This may be because they are unsure of their own methodology, or (as mentioned at the beginning) another explanation for doing this is to boost their own credibility as a pollster. Indeed, Survation admitted to participating in herding and having no faith in their convictions, Damian Lyons Lowe (Survation CEO) reported that "the results seemed so “out of line” with all the polling conducted by ourselves and our peers – what poll commentators would term an “outlier” – that I “chickened out” of publishing the figures".
The following graph simply plots the reported Labour / Conservative split of the polls in the week leading up to the general election (following the final Question Time leaders debate), the solid black line being the average of the polls. What is entirely evident is that many of the polls agreed precisely with one another, and that all of the polls were within 2 percentage points of the average.
Now we can not discern whether any one poll is a statistical outlier, but we can attempt to investigate Patrick Sturgis' comment that there was a “lack of variability across the polls” by looking at them as a whole. Using the methodology we developed in our earlier blog post (assuming that the underlying proportion of voters voting Labour or Conservative remains constant, and that the pollsters are able to unbiasedly sample from the population), we can compute intervals in which we would expect 50% of the polls (the red dotted band) and 95% of the polls (the dotted blue line) to lie within (see figure). Relaxing either (or both) of our assumptions make the intervals wider!
What is clear is that far fewer than 50% of the polls (i.e. 12) lie out-with the red dotted band - in fact only 4 do. The chance of such small a number (even under our conservative assumptions) is around 1 in 800 - somewhere between the chance of being born with 11 fingers or toes (1 in 500) and asteroid 1950 DA hitting Earth on the 16th March 2880 (1 in 4000)! It is safe to say that we can conclude that Sturgis may be on to something!
If the pollsters were not herding, we would more reasonably expect variability looking like that in the following figure.
Interestingly, if we look at the polls in the week before (i.e. 7-14 days before the election), the polls are a lot more variable and more in line with what we would expect.
Do you think this is a coincidence? We don't. Rather than guessing what we want and parroting it back to us, please give us the whole truth.
Here's the raw data: