In a previous post , 4 days ago, I presented some common-sense modelling of the #Coronavirus #outbreak which was simply an attempt to check the sensibility of the official numbers. I updated that modelling two days ago using the data of two days for some minor adjustments. Now, again two days later, a second update. My conclusion: the updated model of two days earlier is doing quite well. The official figures still make much more sense to me than the vast array of conspiracy theories and “bad science” out there.

Updated data

The previous update was based on the officially published data on the #COVID19 outbreak between January 18th and February 9th. Today I am analyzing how the model we arrived at performs when we include the data of February 10 and 11. The last time we did an update, the inclusion of the February 8 & 9 data was taken as an opportunity to include an explicit ‘lagged-mortality’ into this simplistic model. The final model was


where n[t] is the number of confirmed infections on day t since 18/01/2020, g is the basic rate at which the infection grows, nmax is the maximum number of people the infection might reach if quarantine measures would be 100% effective and ‘a’ is the number of people that ‘leak’ into the quarantine-group every day due to the quarantine not being 100% effective. Finally ‘mu’ is the fraction of patients leaving the group of infected patients, either due to death or due to being cured, and ‘tau’ is the lag in days between counted among the infected and the date of death or recovery.


Based on the earlier data we had established the following values for these model parameters, in three cases: 1) ignoring mortality (mu = 0), 2) taking an optimistic mortality of 1% and twice that for the rate of curing (mu = 3%) and 3) taking the widely published mortality of 2% and taking again double that for the rate of recovery (mu = 6%). This generates the following

  • If mu = 0: nmax = 25,000 people, g = 0.34 which translates into a 40.5% probability each day of an infectious patient infecting a new patient and the leak of the quarantine amounts to 1050 people a day;
  • If mu = 0.03: nmax = 25,000 people, g = 0.36 which translates into a 43% probability each day of an infectious patient infecting a new patient, the leak of the quarantine amounts to 1300 people a day and I used a lag of 1 day;
  • If mu = 0.06: nmax = 25,000 people, g = 0.37 which translates into a 45% probability each day of an infectious patient infecting a new patient, the leak of the quarantine amounts to 1400 people a day and I used a lag of 3 days;

These three scenarios lead tot he following graphs, in which the dots represent the actual reported data until 11/02/2020.


All these three model options look as if they reasonably fit the reported data. In other words: if mortality & recovery contribute to the slowing down of the spread of the infections then we need to assume slightly higher infectiousness and slightly larger quarantine leaks in order to explain the reported data. But there is nothing extremely out-of-the-ordinary going on here and the reported data seem sensible.


Now some might argue that a lag between being reported as infected and death or recovery of 3 days, or even 1 day is way to short. So let’s expand this to 6 days and see what happens. And let’s also include a mortality of 18% which is circulated by some sources. This generates the graphs below.


Two of the lines still fit the data quite nicely. These correspond to a mortality-rate of 2% and a recovery-rate of 4% and in order to fit the published data we would need to make small (less than 1%) changes to the quarantine leak and the infectiousness of the disease.

The badly fitting green line is the result of an attempt to make the data comply with a 6% mortality-rate and a 12% recovery-rate, yielding a total of 18% of cases being removed from the pool of infectious people. It is very difficult to make the fit better than this misfit you see here. It requires a higher infectiousness of the disease as well as a larger leak simply because of the fact that too many people die for the disease to remain very effective at spreading. If the morality-rate alone would be 18% there is no way this fits the published data. A mortality rate of 18% is only credible if you assume (1) huge underreporting of deaths and (2) much, very much higher rates of transmission. The extremely high death- and transmission-rates required are not reported anywhere. As a result I still remain extremely sceptical about the 18% figure and for the moment can only consider it either rumour or “bad science”.


Adding the latest two days worth of data to the model & parameter-values found earlier this week required no change of the parameter-values. As a result the “many-grains-of-salt” one-month forecast of this simplistic model remains unchanged: In about a month’s time the aggregate number of confirmed infections will be between 75,000 and 80,000 which, at a 2.1% mortality-rate, would imply around 1600 casualties.

I don’t expect this model to actually allow such a prediction to be very credible as it is still rather simplistic and leaves out many details of the spread of such a disease. However I feel quite confident that high mortality-rate estimates are simply not borne out by the progression of the outbreak as there is no sign of high mortality hampering the spread.

The sources claiming this high-mortality (see the post-script on my previous update) also find wildly different mortality-rates between (a) Hubei province, (b) China outside of Hubei and (c) the world outside of China. I struggle to see how this makes sense physiologically or contextually. It seems to me that a much more likely explanation of these findings is the use of data-samples that are to small and/or affected by a selection-bias.


For the moment my conclusion remains the following: this simplistic model cannot tell whether the officially provided data by China are correct, but what the modelling does suggest is that the publicized numbers fit a very reasonable narrative about the events unfolding. I would take a reasonable narrative any time over spectacular claims in rumours that need backing up by even more spectacular assumptions of mischief and deceit.

5 responses to “Modelling #CoViD19: A #Coronavirus modelling update”

  1. Modelling #2019NCoV: An Update – My Imaginary Numbers Avatar

    […] despite the fact that they are deploying vastly more elaborate methods than I do. I have written a second update of my blogpost including the confirmed-case numbers up until February 11th, confirming the estimates […]


  2. Modelling the #CoronaVirus outbreak – My Imaginary Numbers Avatar

    […] A second update including data published until 11/02/2020 is available here. […]


  3. Modelling #CoViD19: A #Coronavirus modelling update after case-redefinition – My Imaginary Numbers Avatar

    […] immediate jump in the number of confirmed cases as well as of confirmed deaths. Does this mean that my model laboured over the past few days was ripe for the trash can? Not […]


  4. #CoViD19: A #coronavirus #modelling update – My Imaginary Numbers Avatar

    […] the data in the 3 different scenarios discussed in earlier posts here, here ,here and here then we find the following […]


  5. #CoViD19: A final #coronavirus #modelling update – My Imaginary Numbers Avatar

    […] a couple of post formulating a very basic model and seeing how it does. You find that here, here, here, here and here and overall the fit was good. Now a month later let’s see how the model does […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: