The #Coronavirus outbreak of 2019/2020 is currently dominating the news cycle. Let’s have a look how the official data from the Chinese Ministry of Health compare to three relatively simple models for such a single-centre outbreak and ask ourselves what this tells us.

**Simple exponentials**

The simplest view of the outbreak of an infectious virus is to assume that during each time-period *t* the infected population *n[t]* will infect a certain fraction *b* of new patients. Such a scenario can be described by a simple, so-called *differential equation*,

which is actually very easy to solve to find the growing population of infected patients for all future times once you know their number at some starting-time *t = 0*. First-year students in many undergraduate science and engineering degrees will know how to do this, so it’s not ‘magic’.

If you do this for the data of the Coronavirus outbreak of #ncov2019 from Wuhan in early 2020, and you only look at the data from 18/01/2020 you get the following graphs

The blue curve are the actual data released by the Chinese Ministry of Health (MoH). The green curve is a solution of the above equation for the case where each infected patient has a 46% chance of infecting a new patient. The orangy line is the solution where that chance of causing a new infection is 30%. Now you immediately recognise two interesting facts here:

- Although the 46%-chance line seems to fit the early data well, the actual rate of new infections has been
*significantly lower*since roughly 29/01/2020; - Although the 30%-chance line gets today’s (07/02/2020) number of infections right it is clearly much to steep now to be a good description;

Two conclusions would seem very reasonable, to me, to draw from this. First of all, the rate of infections is slowing relative to the simplistic model-equation above. Secondly, the simplistic model does not capture something that is actually important in explaining these graphs.

**Deaths and confirmed cases**

Before we try and improve the simplistic model, let us first take a closer look at some of the data that are available now. First let us compare the number of confirmed cases of #ncov2019 with the number of confirmed deaths. If we multiply the confirmed deaths by 48 and plot them in the same graph as the number of confirmed cases we find the following figure.

The blue line is again the number of confirmed cases and now the orangy line is 48 times the number of confirmed deaths. They follow each other very closely, don’t they? This suggests that the mortality-rate is around 1/48 which is around 2.1%. You might recognise this as the figure widely quoted in the media. But what the graph tells you is that this figure has been remarkable constant over the past 20 days. This makes that number more credible every day it stays like that.

**Improving the model: maximum population**

Now one very basic thing that the model above does not take into account is the fact that the total population that *potentially could be infected* is not infinitely large. There is some maximum *nmax* which will not be exceeded, *ever*. Pessimists would probably say that this maximum is simply the total population of the Earth. But that is overly pessimistic.

This *nmax* is actually a very important variable of the model, for the following reason: our efforts to contain such outbreaks usually consist of two elements:

- Quarantine efforts;
- Vaccination efforts;

Both of these seen to keep the maximum population that could be infected as small as possible, either by preventing spread to the wider population and separating the infected population from the uninfected population, or by vaccinating the uninfected so as to make then ‘resistant’ against the infection with the virus. Vaccinations are currently unavailable for #2019ncov so all the efforts have gone into quarantine measures. So we can include this attempt at keeping the maximum possible infections low into our model The new equation then looks like this

This is again such a differential equation albeit it one that is a little bit more difficult to solve. In addition to having an estimate of the chance of infection-transmission we now also need to have an idea about the maximum size of the population that *could* be infected at some point in the far-future.

If we look at the comparison with data I made earlier, the simplistic model seemed a good description *until around a week after the introduction of large-scale quarantine* measures. So let us assume that the deviation of the data from the modelled trend with the simplistic model is entirely due to those quarantine measures. Then we would choose the 46% infection-probability and try to find some values of the maximum population that give a reasonable fit.

The blue again represents the data from the Chinese MoH whilst the green line is the model with a maximum infected population of 40,000 cases and the orangy line is for a maximum of 37,000 infections. These two graphs suggest that the slowing down of the spread could very well have been caused by the effective isolation of the infected population to a group of between 37,000 and 40,000 individuals.

However we again see that there is trouble with the fit between our improved model and the actual official data. Our model still *overestimates* the number of confirmed cases between 12 and 18 days since 18/01/2020. Yet the quarantine’s 37,000-40,000 range would start to *underestimate* the number of confirmed cases in recent days. Can we do better than this?

**Leaky quarantine**

As a final version of the model let us assume that the quarantine is ‘leaky’, i.e. that the maximum number of possible infections slowly grows at a rate of *L* people per day. Of course this need not be the actual cause of why our earlier model failed, but it is perhaps the least spectacular and most likely cause. No quarantine is ever complete, no matter how hard a state tries.

So the new model equation now becomes

and if I solve this equation assuming that the original quarantine cap was 25,000 people, the probability of infection is 46% and the quarantine leak is 900 people a day, then this results in the following curve

The dots are the actual published figure and the line is the calculation based on the model with leaky quarantine. The fit looks decent although it is still clear also this model does not capture everything correctly. But to me it seems reasonable that, instead of pursuing all kinds of outrageous conspiracy theories about what is happening, the data looks to be in agreement with a situation that would (1) contain a strong and largely successful quarantine effort since mid-January and (2) a quarantine that has a small but noticeable ‘leak’. Nothing insidious happening.

The fact that a *reasonable* model fits neatly with these official estimates gives me a lot more confidence in those estimates. If there were some kind of ideological bureaucrat meddling with the case-numbers then I suspect the meddling would hardly produce data that seem to have a reasonable interpretation in terms of precisely the characteristics you expect to be there anyway. I have no reason to believe they are using this model to calculate the data they publish … but of course they could.

**Looking ahead**

Now of course we don’t really know how reliable the published data are. At the moment I am fairly relaxed with the assumption that they are a reasonable reflection of what actually happens. A 2.1% mortality rate sounds reasonably what could be expected, a 46% chance of transmission sounds reasonable, a quarantine effort targeting about 25,000 actual prospective patients sounds like something a city authority could manage and finally a leak in that effort that adds about 900 people a day to the potentially infected equally sounds not unreasonable.

Following the future trend of the reported data should two things. The *least* important thing is that it would allow me to check whether this model stays ‘on track’. A far more important thing is that modelling the data reported by authorities allows a “common sense” check on whether they are reporting the ‘real thing’. So far, from my considerations, I have the impression the authorities are reporting ‘the real thing’ to the extent that *they* can actually see it. They cannot report deaths they do not see, or infections they do not register. That is true for Chinese authorities, but equally for authorities anywhere else in the world.

Finally, with many grains of salt, let us take a look at what the last model says for, say, the next month when we will have reached 50 days since January 18th 2020. Where do we end up by then if this model is ‘correct’? Computing the graph gives

What we see here is the following: after about 25 days the infectious spreading of ncov2019 is no longer dominated by the inherent infectiousness of the virus, but by the leak of the quarantine. The ‘leaky quarantine’ assumed here will yield 70,000 confirmed cases by March 9th 2020. If the mortality estimate is correct, then this would mean 1470 deaths by March 9th 2020. This model suggests that, as long as there is no vaccine available our only tool is to sustain the quarantine effort and to try and stop any leaks as much as reasonably possible.

As horrid as these numbers are … they do show that the current efforts are working and are ‘paying off’. Anyone comparing this with seasonal flu in an attempt to suggest the efforts are overblown and we could do without them is callously playing with fire. You don’t even want to see the numbers that we get if we return to the simplistic model with its unbridled expansion of the disease. They are horrifying beyond anything a seasonal flu ever does. Stop comparing them!

A second update including data published until 11/02/2020 is available here.

**A third update after the case-definition revision can be found here.**

## 4 thoughts on “Modelling the #CoronaVirus outbreak”