Ideas
Global Death toll due to Covid-19 outbreak has increased to over 8.63 lakh (representative image) (Pic Via Twitter)
A slew of recent papers tried estimating excess deaths during the Covid-19 period so far. While their approaches differ, they share a common trait: a significant upward bias.
Authors and methods consistently err on the side of exaggeration. I demonstrate this by example, dissecting one of the better approaches, for India’s first Covid-19 wave.
It is premature to debate second wave estimates, while both wave and data are incomplete.
One paper uses a questionable method of copy-pasting IFR (infection fatality rate) from other countries to India, without basis. Even while using death statistics from CRS (civil registration system), its estimate is a panoramic sweep from 0.1 to 2.3 million excess deaths during the first wave (equivalent to my claiming Zomato buy-price of Rs 10 to Rs 230).
I ignore both these papers to focus on a third paper.
What does this paper claim about first wave?
Using CRS registered data for six states for chosen months within 2020 (‘peak period of first wave’), the paper estimates 2,18,000 excess deaths versus a baseline (22 per cent increase) for these states.
The following table is as shown in this paper.
A few methodological aspects are noteworthy. Only six states are included despite data being available for at least double the number of states.
The periods are highly customised, with four different durations employed across six states (July-October, June-November, July-December, August-March).
The baseline is mostly the average of 2018/2019 without trend adjustment.
Note that the trend for these six states is a non-trivial 4.5 per cent increase per year in registered deaths across 2013-19.
Lastly, the above table shows a direct ratio of excess deaths to reported Covid-19 deaths, although studies in the West show that only 70-75 per cent of excess deaths are directly due to Covid-19.
The simplest way to spot pitfalls is to apply an identical approach to a previous non-Covid year.
I take the same approach and data, for same states and time periods, and apply them to 2019 (see the following table). When 2019 is compared to 2018 in this manner, the exercise yields 73,000 excess deaths, or 8 per cent increase over 2018.
Significant excess deaths well before Covid-19 demonstrate that a decent part of the above 218,000 figure is an artefact of method, not Covid-19.
In fact, comparing 2020 to 2019 (which makes more sense than the 2018-19 average) reduces excess deaths from 218,000 to 175,000.
If deaths grew by 73,000 in a non-Covid-19 year and by 175,000 in a Covid-19 year, only the difference of 102,000 can (potentially) be attributed to Covid-19.
To reveal how fickle this method is, let’s look at Kerala.
Here, applying the same method to 2018 over 2017 yields 8,600 excess deaths, way higher than the paper’s estimate of 4,500 for 2020.
Much as Kerala seems to have a prolonged Covid-19 wave, it surely didn’t start in 2018.
Kerala’s 4,500 ‘excess’ deaths estimate is noise, not signal. Incorporating this reduces true excess deaths during the ‘peak period of the first wave’ to less than 100,000, way below the stated 218,000.
More generally, with noisy data and imperfect methodology, it is better to ignore small deviations as inevitable fluctuations around trend, rather than assign any meaning to every blip.
The above analysis has reduced first-wave excess deaths in these six states from 22 per cent of 2019 deaths to around 10 per cent. The paper’s estimate is inflated by over 100 per cent.
What I have demonstrated above is an illustration of improper baseline and inadequate trend adjustment. Distortions can be severe for India, where the all-India trend growth rate is 3 per cent a year.
2018 and 2019 saw registered deaths grow 8-10 per cent a year, well before Covid-19. The above six states saw deaths grow 8 per cent during the chosen period of 2019.
Against this uptrend, calculating excess deaths as the simplistic difference between one period and another is erroneous.
To illustrate this point, imagine applying this paper’s method across all of India for prior years. 2019 excess deaths are calculated by deducting average of 2017-18 and so on.
The resulting ‘excess’ deaths are shown in the following table.
Nearly a million excess deaths per year can be conjured, without any Covid.
Cherry-picking problem
The choice of customised time-periods for each state is also a problem. Choice is made by eyeballing monthly deaths and choosing contiguous months with high deaths.
To select high-death months and conclude that deaths are high is circular reasoning.
If the implicit premise is that official data is unreliable, we can’t be sure of the exact start and end dates for ‘peak Covid’, nor can we choose it subjectively.
If the Covid-19 first wave started in early-2020 and tapered off in early-2021, objectivity requires including data pertaining to the entire period, to see how deaths trended.
Researchers find this inconvenient as they have to include the lockdown period, which clearly witnessed Covid-19, but had depressed registered deaths.
Excluding the lockdown period equates to an extreme assumption that zero lockdown-period deaths were registered with a lag after the lockdown ended.
Both belated registration and Covid-19 can lead to excess deaths in the post-lockdown period. With a majority of Indians now having access to formal finance, it feels odd for them to forgo something essential for transferring assets to the next of kin.
The opposite, 100 per cent registered with a lag, may not hold either. But, without estimating both extremes, we only have an upper-bound, not range.
If I uniformly consider all of 2020 for these six states, registered deaths grew 7.4 per cent over 2019. Excess deaths correspond to 2.9 per cent of 2019 registered deaths (i.e. 7.4 per cent minus 4.5 per cent trend).
With over two million registered deaths in 2019, 2020 witnessed 60,000 excess deaths across six states. The lower bound is 60,000 over 2020. The upper bound is 1,00,000 over the chosen period (roughly half of 2020).
This range of estimates is 30-45 per cent of what is estimated in the paper. Even from this lower range, we cannot directly extrapolate to pan-India figures.
A larger sample of 14 states witnessed a lower deviation from trend than these six states, implying that the rest of India did better on a per capita basis.
Using only one side of the distribution
Cherry-picking can take other forms too. The paper includes the following para (pertaining to a slightly different analysis of CRS data):
Any parsing of data will produce a range of outcomes. Some favourable to the hypothesis, some unfavourable. Some on one side of the distribution, some on the other. Retaining convenient data-points while cancelling inconvenient ones isn’t sound statistics.
Cherry-picking one side of the distribution is a sure path to biased estimations. Presupposing some outcomes as implausible (or selectively blaming data quality) while wholeheartedly accepting others is inconsistent.
If I made the case to invest in a company only based on good years, I’d face a serious career risk.
How about other approaches employed in this paper?
I focused on the most reliable CRS-based approach of the three employed in this paper.
The other two data sources — National Health Mission and consumer survey — are both admittedly less reliable.
Even within the CRS method, I ignore a similar estimation using city-level data since cities witnessed way higher increases in deaths than even their home states (e.g. Mumbai, Chennai, Kolkata, Bangalore in 2020), making them unsuitable for extrapolation.
I also ignore ratios calculated using UN-estimated deaths for 2019, following the golden rule of basing estimates only on data, not other people’s estimates.
What does this mean for pan-India excess deaths in 2020?
Across its three methods, the report estimates roughly 0.6 million excess deaths during India’s first Covid-19 wave. In the light of the above analysis, the true figure could be less than half of that.
Further, only a subset of excess deaths is attributable to Covid-19. Given 0.16 million reported Covid-19 deaths, the undercount factor for the first wave is not far from the ‘innate’ undercount factor seen across countries (>1.5x even in Western countries with better systems).
I have not addressed India’s second Covid-19 wave as it is premature for precise estimates. That does not imply complete ignorance. Available evidence provides a clear directional sense.
The spike seen in registered deaths across states during April-May-June 2021 is severe and unprecedented. This is well above what can be accounted for by data unreliability or trend adjustment.
While touted estimates have the same methodological problems and exaggeration biases, excess deaths and Covid-19 undercount seem higher in the second wave.
Deriving an accurate number for deaths due to Covid-19 is crucial, both to understand what happened and to prepare for what is to come. However, failing to do this reliably and objectively is counterproductive. A consistent upward bias harms understanding and credibility.