A few days ago I presented a simple model for the outbreak of the Coronavirus and the main conclusion of that was that the numbers reported by the Chinese authorities seems sensible and reasonable. That does not make them correct of course but it should make us weary of outlandish conspiracy theories. Here is an update that fixes a flaw of the models presented, that incorporates the latest figures and questions an old result circulating on the web.
Even the most elaborate of my simple models in the previous blogpost still had a basic flaw: patients that died or were cured were still counted as infections. That is of course not realistic. However if mortality is low and the disease spreads fast, then this flaw is not very important in terms of forecasting future levels of infection. Currently just over 40,000 people are infected and just under 1000 people have died. Whether we explicitly take that loss of people into account in calculating how many people will be infected tomorrow might make a difference of 2% or 3%. But I wouldn’t believe a simple model to be that accurate anyway.
But it is a flaw we can correct. If we have n[t] infected people at time t and it would take patients who die about t’ days to die, then in every period t the number of infected people will diminish by a number proprtional to n[t-t’] as these are the ones that will now (on average) die. The same would be true for the patient that heal if we assume that after healing they are no longer infectious. So let us amend the model of the last post into the following form
Here n'[t] is the rate at which the number of infected person changes. The number g represents the basic probability for a infected person to infect someone else. The number nmax is the total number of people that could be infected at some point while a is the number of people per day that gets ‘added’ to that group due to the leakiness of quarantine measures. The new term contains the number ‘mu’ which represents the fraction of infected patient that stop being infectious either because they die or are cured. The time ‘tau’ is the lag caused by the fact that infected people take time to heal or die. So in order to calculate what happens in this model we need estimates of what these numbers are and we need the number of infected people at some initial time t=0.
In my calculations in the previous post I have chosen January the 18th of 2020 as the t=0 and I will keep doing so here. In our previous post we found a value for g or 0.37 which translates into a roughly 45% probability of an infectious person infecting someone else, each day. Many current estimate hold that each patient was infecting about 2.5 other people (on average), which would suggests that patients typically are infections for about 6 days. More recent research has revealed that the median icubation period is about 3 days but with a long tail. Cases with incubation times of 14 or even 24 days have been reported. But 6 puts it in the right ball-park.
What does quarantine do?
So with g known, what can we say about nmax? Quarantine measures started to become significant roughly from January 23rd and we started seeing their effect in the data around a week later. The official Chinese figures indicate that around that date they had around 8000 people in quarantine or self-quarantine. If we assume that quarantine were 100% effective except for people living in the same household as those infected, and we use that the average Chinese household consist of roughly 3.5 people then this would give us 28,000 potentially infected people if quarantine were 100% effective except for within households.
For my models I chose the number 25,000 to be on the cautious side. But I added a “leak” of about 900 people a day. What this means in terms of human behaviour is the following: we are assuming that out of these 25,000 quarantined people in households with one quarantined person, everyday 1 in 88 people break that quarantine and bring another household of 3.5 people into the “at risk” population. This daily compliance-rate of just under 99% of the population may seem very high. But this is an exceptional circumstance in a rather authoritarian society. It doesn’t seem unreasonable to me. How do people ‘break’ that quarantine? Well, the pop over to their neighbours in the compound, for example, or visit their brother/sister, aunt/uncle who live in the same street or look how they parents are doing in the same town. Nothing insidious and nothing exceptional.
Improved model fitting
If we try to reproduce the data with a model of the type described above then we get the following graphs.
The blue dots are the official Chinese figures for the number of confirmed cases. The blue line is the line which ignores the fact that people who die or who are healed no longer infect others. I did have to update my estimates of the numbers defining the model. The probability of infection needed to be lowered from 45% to just 41% but the leak was upgraded from 900 to 1050 a day since 18/01/2020. With a mortality of 2% this blue curve also fits the data on the numbers of deaths accurately.
The orange-brown curve incorporates a mortality +cure-ratio of 3%. As a result the probability of infection would need to be a little higher (42%) and the leak also a little larger (1300 a day) in order to explain the official data. Finally the green line results of a 6% mortality + cure ratio, a probability of infection that is 43% and a leak that amounts to 1400 people a day.
Rumours and old research
Now what you see here is interesting: if you assume a higher mortality you need to also assume a higher chance of transmission of the infection and a larger leak. A modelling paper produced by researchers at Imperial College around January 20th is circulating online and in that paper these researchers propose a mortality of around 10% possibly even as high as 20%. If I put that estimate into my model I have to assume a chance of transmission that jumps to 52% and a leak that exceeds 2000 people a day in order to reproduce the official figures for the number of confirmed infections. However the number of deaths this then predicts is about 10 times the official figure. What we see here is the perfect recipe for click-baity news and rumours.
Why does this paper circulate and is so frequently cited? Well, because it suggests that the Chinese authorities are hiding 90% of the casualties, that the disease is far more infectious then people assume and is far more lethal than people assume. Cover-ups, pandemics and death … the ultimate clickbait in the age of #2019NCoV.
Based on my own estimates however I think reality is far more prosaic. A chance of transmission that aligns well with other data published, a quarantine leak that seems very conservative indeed and a mortality that does not require the whole-sale disappearance of thousands of bodies together form a perfectly reasonable description of the official Chinese figures. The leak is slightly larger than I thought in my previous post, the infectiousness is slightly lower and the mortality pans out to what the aggregate numbers published are suggesting.
All in all I still see little reason to actually doubt the Chinese official figures. They seem to be in the right ballpark. I see plenty of reasons to doubt the circulating rumours, including those which are sourced from ‘old’ research that seems well beyond its use-by-date. In an outbreak a few days can make a big difference. The data used in the Imperial College paper circulating online is pretty much all from the very first few weeks of the outbreak.
My updated model still predicts that in about a month from now we would be looking at between 75,000 and 80,000 confirmed cases for the Hubei outbreak.
Post-script 10/02/2020 22h22
The Imperial College report mentioned above has by now been updated and is available here. Obviously the researchers of Prof. Neil Fergusson’s group are the experts here. Nevertheless I do not find their results convincing despite the fact that they are deploying vastly more elaborate methods than I do. I have written a second update of my blogpost including the confirmed-case numbers up until February 11th, confirming the estimates above.
My calculations are at best an “order of magnitude check” rather than a precise description. But let me try to put in words why I find their results so far unconvincing.
 If their mortality estimate of 18% is correct then (I think) we should have seen a gradual shift in the aggregate mortality figures, that can be easily extracted from the data provided by China, over time since 18/01/2020. Yet this percentage stays remarkably close to 2.1%.
 They find wildly different mortality rates for the three main groups of infected patients, (a) those in Hubei, (b) those in China outside Hubei and (c) those outside China. Now that may be statistically significant but, in my view, makes biologically very little sense.
 The data sources they list seem to suggest their data-set concerning Hubei is extremely small and entirely pre January 21st. In my analysis I entirely discard the preiod before January 19th simply because the data seems incredibly patchy and unreliable. Given that early casualties would most likely be persons with weak immune systems or general health problems it would seem that such a data-selection over-estimates the actual fatality.