Exploring The Relationship Between Strictness of COVID-19 Policy and Impact of Pandemic


Since the start of the COVID-19 pandemic in 2020, governments throughout the world have been implementing considerably different policies to combat the virus. It draws my interest that a few countries (e.g. Sweden) are not implementing any quarantine or lockdown measures based on the epidemiological concept of “herd immunity”. As the name suggests, it is achieved when a large proportion of a population “becomes immune to an infectious disease” so that its spread is significantly contained. Without the popularization of an effective vaccine last year, the government of Sweden adopted the “herd immunity” approach to let the majority of the Swedish population contact the virus and develop the antibodies on their own. In this report I intend to explore the effect of adopting such approach on public health and try to examine how effective quarantine measures are to combat COVID-19 with the help of python data analytics.

The research questions I intend to explore in this EDA report are:

  1. Is the herd immunity approach effective? What positive or negative effects did it bring to Swedish public health (comparing to France and other European countries which by large did not adopt such approach)?
  2. Building on the first question, are quarantine and lockdown measures effective? Is there a correlation between strictness of quarantine policy (i.e. number of days of lockdown) and the impact of COVID-19? If there is, how strong is the relationship?

To explore these questions, I first conducted a basic analysis that compares the data between Sweden, France, and European continent to examine if herd immunity in Sweden is effective. After that, I conducted a correlation analysis between the number of days of lockdown and the COVID-19 impact in every country in the world. Below are the detailed process of my EDA project.

Basic Analysis: A Comparison Among Sweden, France, and European Continent

1.Import libraries:

Here are the python packages I used for this project.

2. Read data into DataFrame:

This csv file contains the primary data used for this project. It contains all the COVID-19 data in all countries from 2020–01–22 to 2020–07–27, a total of 188 days. For convenience, I parse the “Dates” column into datetime objects.

3. Extract data for comparison analysis:

I extracted data from the DataFrame for Sweden, France, and Europe as a whole. The reason I make this comparison is because: 1. France is a country similar to Sweden in population makeup and experienced similar COVID-19 impact from the beginning of 2020. The only difference is that France implemented a bunch of quarantine and lockdown policies throughout the pandemic while Sweden insisted on the “herd immunity” approach. 2. I included the data for Europe as a reference to the bigger geographical region that both countries are situated in. Generally speaking, most countries in Europe, like France, implemented some quarantine measures during last year.

4. Comparing confirmed and active cases as a fraction in population

For each geographical region, I divide the confirmed and active cases by the population of the region to compute the fraction of COVID-19 cases in population. Then I plotted it against date.

As shown by the diagrams above, Sweden has a considerably higher fraction of COVID-19 cases in its population than France and Europe. At the same time, possibly due to the implementation of quarantine measures, the trend of COVID-19 growth in France is similar to that in Europe, and there is a bigger gap between confirmed and active cases than Sweden. That makes me wonder what kind of people constitutes the gap between confirmed and active cases for Sweden and France and why is it significantly smaller in Sweden.

5. Comparison of deaths and recoveries from COVID-19 cases

That brings me to the analysis of deaths and recoveries from COVID-19, the primary two groups of data that constitute the gap between confirmed and active cases.

I first plot data on deaths of COVID-19 as a fraction of the population against date, and then plot it as a fraction of confirmed cases against date. Here is the code and the output.

Interestingly, as the diagrams demonstrated above, while Sweden has a slightly higher fraction of death cases form COVID-19 in its population than France, its death cases as a fraction of confirmed cases (a.k.a. death rate) is generally lower than France, but is still higher than Europe as a whole. This means in controlling the number of deaths from COVID-19, Sweden is doing approximately the same (or even better) as other countries that have implemented quarantine and lockdown measures.

Then I compute the recovered cases as a fraction of confirmed cases against date for the three regions.

Surprisingly, over the period of time recorded by my data, Sweden has not had any COVID-19 patients recovered, while France and Europe as a whole have helped many patients cure the disease. This explain the significantly bigger gap between confirmed and active cases in France and Europe as a whole. My guess for the 0 recoveries in Sweden is that most COVID-19 cases in Sweden are elderly people, who could have some other health issues that can be exacerbated by COVID-19 disease, therefore it is more difficult to help them restore health.

6. Comparison of growing rates of COVID-19

Another important measure of COVID-19 impact is its growing rate. I compute the rate of increase of COVID-19 confirmed and active cases with pct_change() and plot it against date.

As shown above, the growth rates of COVID-19 are in general similar among the three regions. The only apparent difference is that the growth rates of confirmed and active cases in Sweden largely overlap, while those for France and Europe clearly diverge. This is also consistent with my previous findings.

Correlation Analysis: The relationship between Lockdown Duration and COVID-19 Impact

After the basic analysis on the effect of Sweden’s “herd immunity” approach, I would like to explore into the effectiveness of quarantine and lockdown measures.

1.Import data on lockdown dates

Again, I parse the column “Start date” and “End date” into datetime objects.

2. Data cleaning and calculation of the longest duration of lockdown in each country

By subtracting “End date” with “Start date”, I compute the duration of lockdown in each country. For simplicity, I only keep the longest time of duration for each country in my data

3. Creating df_lastday and calculating median growth rate

I create a dataframe df_lastday to collect all data on 2020–07–27, which is the last day recorded in the original data, including the accumulated confirmed, active, death, and recovered cases. After that I sum all data for each country on a single day and collect them into a dataframe df_c to calculate the median of the growth rate of COVID-19 cases with pct_change() over the period of time recorded in the data. (*I tried to calculate the mean growth rate of COVID-19 cases in each country but it turns out that when I used the function mean() a lot of the results show inf. That’s why I decided to use median() to keep track of the growth rate of COVID-19 in each country.)

4. Add in data on population in each country, used for calculating infection rate in each country.

In df_lastday, I create a column that keeps the population of each country. I did that by importing another set of data and used a nested for loop to add the data into df_lastday.

5. Add in data on lockdown duration

Having the data on lockdown durations in each country I got earlier, I used a nested for loop to add this data into df_lastday.

6. Updating data on Lockdown duration

Since countries like Sweden does not have quarantine and lockdown policy, their data are not recorded earlier, so I look up on the Internet and create a list of country with no lockdown measures and record their Lockdown Duration as 0.

7. Save df_lastday into csv for further analysis on Tableau

Now I have all the data I need for a correlation analysis. For convenience, I decided to use a more powerful data visualization tool, Tableau, to carry out the analysis, so I save the df_lastday into a csv file.

8. Relationship Between Lockdown Duration and Percentage of Active/Confirmed Cases in Population

Using Tableau, I first plot the percentage of active and confirmed cases for each country against lockdown duration in days. I used a linear regression analysis on both graphs.

It appears that there is not a strong relationship between lockdown duration and percentage of infected population. It’s probably due to the large number of data that are stacked around 0 percent in the percentage of infected population. However, I also noticed that in the upper graph, if I deselect those data stacked around at the bottom, I obtained a downward sloping trend between lockdown date and percentage of active COVID-19 cases. See below.

9. Relationship Between Lockdown Duration and Percentage of Death Cases in Population and Confirmed Cases

Next, I repeat the process with percentage of death cases in population and confirmed cases and plot it against the duration of lockdown period in each country. There is still a weak relationship.

10. Relationship Between Lockdown Duration and Percentage of Recovered Cases in Population and in Confirmed Cases

Next, I repeat the process to find the relationship between lockdown duration and the percentage of recovered cases in population and confirmed cases. A weak relationship still exists.

11. Relationship Between Lockdown Duration and Median Growth Rate of COVID-19

Finally, I explore the relationship between lockdown duration and the median growth rate of COVID-19 in each country. It appears that a relatively stronger relationship exists, meaning that the country that implements longer period of lockdown has higher rate of growth in COVID-19, which contradicts with common sense. The surprise of this finding soon became reasonable as I realized that while lockdown duration has an impact on COVID-19 spread, the severity of the COVID-19 pandemic in each country also prompt its government to implement more or less strict lockdown measures. In other words, while I assume that the longer the lockdown duration, the lesser the impact of COVID-19, the relationship also holds another way around: the more severe the impact of COVID-19, the more lockdown approach would be implemented. Therefore, in order to conduct the analysis more rigorously, it will need to be conducted among countries with the same severity of COVID-19 impact in the future.

Conclusion and Future Improvements


  1. The “herd immunity” approach adopted by the Swedish government results in approximately the same rate of death and growth of COVID-19 disease as other countries, but comparing to France and other European countries, it potentially led to a higher percentage of infected population. Specifically, its most serious impact is on the elderly population, who are more difficult to recover from the disease.
  2. In my analysis there is not a strong relationship between the number of days in lockdown and the impact of COVID-19. The potential reason is that the two variables both have an influence on the other. While lockdown duration might ameliorate the COVID-19 situation, government officials will be less prompted to implement lockdown when there is less COVID-19 impact. Therefore, the analysis will be better implemented if it is conducted on countries with similar COVID-19 impacts, which entails much more complicated data screening.

Future Improvements:

  1. Conduct the analysis with more data screening to select the countries with similar COVID-19 impacts.
  2. Since I only found data up until 2020–07–27, further analysis can be carried out if more up-to-date data are available.

References and Bibliography



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store