Since January 18th I have been modelling the Wuhan/China #covid19 outbreak. Until 18 February I wrote a couple of post formulating a very basic model and seeing how it does. You find that here, here, here, here and here and overall the fit was good. Now a month later let’s see how the model does now that the Wuhan outbreak is dying down.

Evolution of the confirmed infections

I want to keep this final update short. I have written enough about it and perhaps too many people are writing about it as well. But I do think it is good to close this series with a final update of the model and a final look at how that model does and what this means.


In the model I developed in this series we were estimating the number of infections n[t] at time t working with the following assumptions:

  • there is a basic constant growth-rate of infections, g;
  • there is a maximum number of people who can catch infection due to quarantine measures, nmax;
  • there is a ‘leak’ in the quarantine that adds a people per day to the quarantined population until a ‘switch-off’ time T;
  • there is a rate mu at which infected patients leave the population either by death, by recovery or by geographic separation and this removal happens with a lag of tau days;

When we put these effects together in a model we get a so-called first-order differential equation that looks like this


and which we can use to solve for the number of infected people at any time after some initial time t=0 at which we know the number of infected people. In my modelling here the time t=0 is January the 18th 2020.

What’s new?

In the last post I started noticing that our model was starting to overestimate the number of infections. As a most likely cause I considered that the ‘leak’ in the quarantine would actually stop at some point. Let us ask ourselves behaviourally why this could happen?

First of all let us recall why the leak happens in the first place? When you ask people to engage in social-distancing then this will effectively mean that fewer people will be ‘in reach’ of infected people to cause spreading. If this social-distancing has isolation-like properties then this adds the aspect that you effectively contain the number of people that could get infected to a ‘fixed’ number. If people isolate in their households then the maximum number of infections will be all those who share a household with at least one infected person. If there is no ‘travel’ between households, such as social visits or meeting in the street or the shopping mall, then this is the number nmax. The constant a determines how many people have to be added to nmax per day because people ignore the isolation and social-distancing measures.

The current debate in the UK is about this number. The UK government believes that the social-distancing measures and isolation measures cannot be sustained because people will become ‘behaviourally fatigued’ and will start violating these measures thus adding to the spreading. You will see that according to this model and according to the Chinese data this is simply wrong!

The model above makes a different behavioural assumption: Suppose the people are leaking into the quarantine from the start because they either don’t take it seriously or they don’t believe it will work. If the quarantine shows some measure of success however people might re-evaluate their assessment of the situation and the buy-in of the population might actually increase because they see the positive effect. In that case the leak would slow down or perhaps even stop. Here I will make the assumption that the leaking stops completely, i.e. people go from partial compliance to total compliance because they can see it works.

Comparison with the data

So what do the data say? Taking the official figures from 18/01/2020 till 14/03/2020 and making the following assumptions about our parameters:


The three scenarios all assume that the 100% compliance emerged after 30 days of social-distancing and isolation measures, that the rate of behavioural switching is rapid (this is the parameter b) meaning that people do not slowly change their minds but stick to their beliefs until they make a quick switch, that the original quarantine population was 32,000 people. The scenarios differ in terms of the lag with which people ‘leave’ the population of infectious people, tau being 1 or 3 days, the rate of infection g and that the daily increase of people in the quarantine population due to the leak ranging between 1400 a day to 1800 a day. What this means is that this model assumes that during the initial 30 days of the social-distancing and isolation measures there is a 6% leak in the compliance. This means the actual non-compliance is much higher but not every non-compliance leads to the effective addition of people to the quarantine population. Finally the scenarios differ in terms of the rate at which people are removed from the infectious population by death, recovery or hospital isolation. The scenarios estimate 0, 3% or 6%. I have discussed in other posts how this would connect to the mortality-rate.

If we solve our equation for these three scenarios and compare it with the actual Chinese data we get the following graph.


In the first 14 days you see the typical exponential growth phase. Even though there is a quarantine in place, the number of potential people that can be infected is still much larger than the actual number and hence the disease seems to rampage uninterrupted. Then from 14 to 21 days the exponential slope becomes an almost linear one. This is the first sign that the virus struggles to find new victims as it is reaching the population-limit it can attack in the quarantine-conditions. Then, roughly till day 30 there is an almost linear growth; here the quarantine leak dominates the spread. The exponential rampage of the virus is halted, but the behaviour of people adds infections to the population. However people notice that the speed of infection slows and as a result their confidence in the measures increases and their compliance increases, in this model as a ‘sudden realisation’ at day 30. That greatly reduces the growth of the infection and almost halts it 5 days later in the model world.

We see that the three scenarios follow the data quite well, in fact although the three scenarios are different they are probably all better than you can expect from such a simple model, all equally good (or bad). The real-world data shows some slow residual growth after 40 days and there can be lots of explanations for this: tiny leaks in the quarantine, ‘hidden’ cases that only come forth now the disease is taking on a controlled character and many more possible effects.

So what’s the take away of this?

Well, here you see a mathematical model that assumes increasing compliance with social-distancing and isolation measures as a response to observable success of these measures. The model fits the data. Does the model prove that this is what happens? No of course not! The narrative I provide with the model is an interpretation of the model and of what it says. Determining whether this interpretation is the right one would require a much more careful analysis that would also involve lots of observational data from observers on the ground such as the WHO mission to China. However what they say about their site-visits is largely consistent with my narrative here.

Where I am more confident is in the following statement: if there were a significant amount of measure-fatigue within the 53 days covered here and if the behaviour of the people in these measures would go from compliance to significant non-compliance then the above curve would look entirely different. In past posts I have studied what happens when the leaks persist and those scenarios would see double the amount of infected people after 53 days then we have here. Increasing leaks due to increased non-compliance would fit the data even less. In fact, the assumption of initial full compliance would also fail to fit the data. So although this is a ‘crappy’ and simplistic model, I don’t see how we can fit growing non-compliance due to ‘measure-fatigue’ or initial full-compliance to the data. I am sure it is possible to construct a more complicated and sophisticated model that can do so … but really isn’t that a bit making a model complicated so that it fits the narrative you want to find?


Now let us look at the discussions of mortality-rate. I will proceed as I did in the last two posts: compare the cumulative number of cases reported to the cumulative number of deaths reported. By the time the epidemic has reached its end-point this, the fraction of these two numbers is the actual mortality-rate, but during the period where the epidemic is still raging on the lag between infection and death of patients skews the perceived mortality-rate.

So what I do in the graph below is plot three sets of data-points of confirmed cases (cumulative) vs confirmed deaths (cumulative) for three different choices of lag: 0 days (blue), 7 days (orange) and 14 days (green). Here is what we get:


The three straight lines are three ‘best-fit’ linear functions from which we can estimate the actual mortality-rate. We find: values between 2.9% and 3.5% which is entirely consistent with WHO estimates. As I argued in past posts, some research groups have been claiming outlandish mortality-rates of around 15%, which seems utter nonsense to me no matter how sophisticated their methods are. Other people are claiming death-rates as low as 0.1% to 0.5% which, it seems to me, are based on assumptions of underestimates of the number of infected patients by a factor of 10 or 30. I don’t believe those either for the following reason: If in my simplistic model I would need to fit in 10 or 30 times the number of possible infections then this would dramatically increase my nmax and that would blow the graphs completely of the chart and nowhere near the data. I have said it before in posts and I will say it again here: I think the data from China are pretty reasonable!

Again this conclusion is also consistent with the findings of the WHO fact-finding mission to China in February. They saw no evidence of wild under-testing of patients, nor evidence of grand cover-ups. With a lack of evidence for conspiratory ideas and a simplistic model that suggests the data provided by China fit a reasonable narrative … well, I know what I choose to believe for now.

Case-load development

Now I can use my model solutions to plot a graph of how the case-load would have developed over the past few weeks. It is a graph of the sort that was also shown by Bruce Aylard in his reporting on the fact-finding mission to China. My three scenarios produce the following curves:


What I like about this is not only that is fits the picture the WHO presented rather well. I like that it neatly reproduces the asymmetry in the curve found by the WHO observers in the Chinese data. The peak-times reach about 5000 new cases a day. But very nice to see is how the behavioural change towards full-compliance in the model pans out in this graph. It is actually the main driver of the asymmetry. Had the leak persisted, or grown, then the asymmetry would have looked entirely differently and much more pronounced. The behavioural aspects that create and change the nature of the leak dominate on the righthand-side of the curve and the width of the asymmetry (in this model) would be a measure of the lag between the epidemic maxing out and the observation of success by the people inducing their behaviour change.

My take-away from that is: you cannot ‘flatten’ the curve symmetrically (as is shown in so many graphics and news items). The lefthand-side pre-maximum is driven by pure epidemiology and not behaviour. The righthand-side is driven by behaviour rather than epidemiology. The behavioural assumptions of the UK government and their symmetric (and sustained) flattening of the curve makes absolutely no sense to me. I do see a behavioural change in my simplistic model, but it is one towards compliance because people can see the rates of infection go down and respond. If there is a channel for ‘measure fatigue’ then it won’t happen in the three weeks following the turn-around of the way the infection grows but much, much later.

Why no more posts

I am done now with posts on modelling #covid19. The main reason: I know something about modelling but I know too little about epidemiology to construct more realistic models and that is also not what I am most interested in. I am not expert in epidemiological modelling either … so these observations and modelling ‘games’ here are my way of checking the reasonability of claims made by policy-makers and more competent researchers in this area. I find the claims made by the WHO utterly reasonable and my simplistic exercise here gives me further confidence in the WHO as well as in the role of the Chinese government once they faced-up to the Wuhan outbreak.

My simple-minded approach has always been at variance, in terms of outcomes, with the results published by the groups who have the UK government’s ear. Perhaps I should ‘submit’ to their expertise rather than try to find reason in the data using a slightly more elaborate ‘back-of-the-envelope’ like this. But the policies their advice leads to affect me as well … very directly. And they do not convince me at all. In fact I consider them downright unreasonable and dangerous. But hey … I am no expert but just a guy with a laptop and modelling experience. I rather trust the WHO and the experts that conducted the fact-finding mission to China … and apparently the governments of other EU countries do so to.

4 responses to “#CoViD19: A final #coronavirus #modelling update”

  1. Christian Avatar

    this a very interesting post considering parameters that actually have a meaning. I wonder where do you use “b” in your model equation and if you solve the equation with numerical integration.


    Liked by 1 person

  2. Rogue 47 Avatar
    Rogue 47

    Hi Christian,

    Thank you for your comment. In the model the function “a Theta[T-t]” is function that basically has the property that it is a linear function of time ” a t” till the time T, after which it remains constant at the value “a T”. This represents the behavioural switching that people stop breaching the quarantine conditions as no further people need to be added to the “quarantine population”.

    I called the function “theta” because ti is quite similar to a step-function which is often denoted by the letter theta. However it is not s step-function, but a step function made using a Tanh-function. The function Tanh[b x] is a function that continually changes from -1 to +1, but the change happens in a ‘small’ range around x=0 whereas it is almost constant to the far left or right of x=0. The parameter b determines how ‘small’ this range is. If b is very small but larger than 0, then the range of x across which the change happens is large. If b is large and positive, then the change happens very fast around x=0. In my simulations I get sufficiently good fits for a b=10 which is large enough for the change to happen “within a day”.

    What I do is indeed solve the resulting differential equation numerically. Setting as a starting value the number of confirmed cases at the starting date.

    The models used by more sophisticated epidemiological modellers are ‘agent-based’, i.e. those try to model human-behaviour at the level of individual humans, modelling their social interactions etc. That in principle allows for much more detail to be calculated but typically also requires much more data and assumptions as input. This model tries to capture the basic dynamics driving the epidemic: growth-rate, total population that might get affected (“quarantine population”), compliance-behaviour with social-distancing/quarantine measures. But as a result it has several flaws.

    For one, it doesn’t allow detailed calculations of spread but that would be relatively easy to fix in terms of equations but come at computing power-cost. Slightly more subtly but no less important: this simple model is a little “vague” about the distinction between “cumulative confirmed cases”, “active cases” and “contagious spreaders”. For a simplistic model here that tries to get “the curve right” this is not so much of an issue. However for a planner who wants to know where to intervene medically this is a big problem.

    That is why in this series I merely presented the model as a ‘sanity check’ on the published numbers but not as an actual prediction tool.


    1. Christian Avatar

      Hi, thanks so much for your response! I am however still curious about the theta function and the b parameter that you mention. From what I understood, theta(T -t) would basically be proportional to tanh( b(T-t) ) + a constant? In this sense, the function a * theta (T – t) would be a function that grows over time starting from a value of 0 until the T – t cutoff and then remains constant. Am I correct with my interpretation of your answer?



      1. Rogue 47 Avatar
        Rogue 47

        Yes you are!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: