In India, exit polls are very, very tricky. So today, when you see the results for Bihar, empathize with the challenges of our polling professionals.
In India, exit polls have been extremely accurate in identifying the leading party/winner but only 40% of the time have they forecasted the number of seats accurately. I explore possible reasons for seat forecasting errors in the Indian context:
The fundamental basis of every exit/post poll is choosing the right sample. In exit/ post poll surveys, the polling station is a critical sampling unit as that the lowest level at which historic data is provided by the Election Commission. Pollsters choose sampling stations that are most likely to be a good representative of that constituency and in turn of the trends in the State. Selecting a polling station is extraordinarily tedious in India.
Let us take an example of Narkatiaganj AC in Bihar. Polling Station no. 84 in 2010 became Polling station no. 82 in 2015, Polling Station 197 became Polling Station 222 in 2015. The first task therefore for the pollster is to align these constituencies by addresses.
The second issue is how does a pollster handle new polling stations? Between 2010 and 2015, 26 new polling stations have been added in Narkatiaganj. Should the pollster consider polling data for 2010 which is an assembly election but has polling station numbering issues or data from the 2014 Lok Sabha election where the data is recent but the polling patterns could be different from an assembly election.
The tediousness of the task, the debate between availability and relevance sometimes causes pollsters to choose incorrect polling stations for the election. These errors are not huge but get overstated when the poll is too close to call.
The second big challenge is correcting for non-response. Now, surveys are done in two ways in India – exit poll (meet a voter outside the polling station) or post-poll survey (visit voter homes after the survey is complete). Both methods have a problem of non-response error. Which essentially means that people who refused to participate in the survey are different from people who participated in the survey.
The simplest two ways to account for this data is to note some demographics of the voter who refused to participate in the survey and correct using publicly available data. This is fraught with challenges on account of ever changing voting patterns and quality of publicly available data.
This is one of the trickiest challenge for pollsters. Now, let us say a sampled polling station throws up 25% more new voters which may be due to some intervention at the local level. Even If this is particular to the polling station, there is no way a pollster would selectively drop the data. Including this data is also a problem particularly when the election is extremely competitive in terms of vote share.
The second problem is how a pollster calculates swing amongst these voters given the absence of previous voting information. Pollsters have found a variety of ways to correct this data including considering past voting correlations between new voters and existing voters, applying sophisticated probabilistic methods and many other ‘innovative ways of solving this problem. However, some errors (usually small) remain and this can impact the final forecasts.
Once survey data is collected, the pollster aligns the data by using a variety of publicly available information (Age, Gender, religion etc). However, in India, one of the biggest determinants of voting is caste, and caste information is not available at the polling station level.
Here again, pollsters have found methods including sourcing caste data through other surveys, past surveys or even ignoring this problem by using past voting patterns as a correction tool. This ensures fairly high degree of correction, but not completely.
5. Correcting for dishonest responses and dishonest interviewers
The last but crucial challenge is correcting for dishonest responses or even dishonest interviewers. This is of particular challenge if the election is only for one day (Delhi) or if the last phase is the most crucial phase and is extremely competitive. There are a variety of ways by which such data is identified and purged but the ability to replace purged data accurately is a huge challenge due to paucity of time or in case of earlier phases, voter memory and response issues.
Last but not the least, every pollster uses an in-house empirical framework to finally calculate the vote shares and seat shares. Many of these frameworks may not have been tested in the long run (Given emergence of many new polling agencies with historic experience) or not in a position to fully accommodate for recent changes in alliances (Bihar 2015 for example) and this often tends to a failure of the framework to estimate the vote shares and seat shares accurately.
While some of these frameworks are getting better by the day, they will need support from a reasonable stability in the environment (Number of voters, polling station issues etc).
The following table will illustrate the challenges that Indian pollsters face when compared with the very successful British pollsters. The significant changes in the Indian Pollsters environment versus the environment of British Pollsters.
Bihar 2015 | United Kingdom 2015 | |
(Likely) Voting Population | 37 million | 31 million |
Likely Sample Sizes (average) | 30000 | 22000+ (Estimated) |
Past Exit Poll data available by Polling booth | Unknown | Since 2001 and 2005 |
Increase in voters since 2005 -Million | 13 m (estimated) | 3.6 m |
(Likely) Vote Share of Leading two parties/alliances – 2005 v 2015 | 67%( 2005) v 85% (2015-Likely) | 67.6%( 2005) v 65.1% (2015) |
Accuracy of Seat and Vote Forecasts | ? | Very Accurate |
Given the above circumstances, it is quite good that our pollsters get the direction right (90%+ of the times) and even the seat forecast right (40% of the times).
Today, when you watch the results, please don’t fret about sample sizes, empathize with the challenges of our polling professionals.