COVID-19 GLOBAL IMPACT
[PYTHON + POWER BI]
🔎 Ready to explore the report?
🚀 Go ahead, it’s all yours!
QUICK REVIEW
Before we dive into the full analysis,
here’s a quick overview.
If you’re here to evaluate my technical skills, or just want a snapshot of the project, you can explore all the core files and documentation below:
If you’re more into data storytelling, no worries — keep scrolling and enjoy the guided tour:
Highlights
Documentation
Numbers Speak
Technical information
📄 Data Sources + First Exploration
🐍 Python ETL Scripts: 5 Unique Formats
🔁 Python Automation for VAERS Folders
📊 DAX Measures + Calculated Columns
🧩 Power BI Modeling + M Code Calendar
🧠 Key Insights VS Official Data



Background, Objective & Tech Stack
❓ Why did I build this project?
As a biotechnologist, I was especially drawn to the topic of COVID-19 and vaccine-related data. It’s a globally relevant issue that also connects directly with my background. The challenge was fascinating: fragmented, inconsistent datasets from dozens of countries, each with different reporting methods.
⚙️ What technologies did I use?
I used Python (Pandas, Jupyter Notebook) for all ETL processes, and Power BI / DAX for data modeling and dashboard creation. I automated the cleaning of repetitive yearly CSVs and built a dual-star schema. This allowed me to integrate different sources (OWID + VAERS) into one unified analytical ecosystem.
🎯 What I'm trying demonstrate?
I wanted to prove I can handle full-cycle analytics: from collecting and transforming raw data to delivering insights through an interactive dashboard. I also aimed to show critical thinking, validate data externally, and contextualize the output. It wasn’t just about “analyzing COVID,” but about demonstrating end-to-end data skills.
🚨 What is the main objective?
The main goal of this project is to extract meaningful insights about the global impact of COVID-19, identifying which countries were most affected, what strategies they adopted, and how they evolved over time. It also aims to analyze the most common and severe symptoms reported post-vaccination, and explore potential patterns linked to specific vaccines or technologies.

Dimensions, Row Meaning, Column Info
Let’s start by understanding the structure and content of each dataset individually. This step is crucial to identify what each row represent, how the data is organized, and which columns are useful for analysis.
DIMENSIONS, UNDERSTANDING ROW MEANING AND COLUMN PROFILING
OWID - Compact.csv
DIMENSIONS:
Total rows = 506.842
Total columns = 60
Columns kept = 37
WHAT DOES EACH ROW MEANS?
Each row corresponds to one day for one country, summarizing all the national-level pandemic indicators on that day.
Column | Description | Keep | Reason |
---|---|---|---|
country | Country name | ✅ | Essential for grouping and comparisons |
code | Country code (ISO Alpha-3) | ❌ | Redundant unless standardization is needed |
continent | Continent to which the country belongs | ✅ | Enables regional aggregation |
date | Date of the data | ✅ | Core time variable for trends |
Column | Description | Keep | Reason |
---|---|---|---|
total_cases | Cumulative confirmed cases | ✅ | Shows the total evolution of the pandemic |
new_cases | Daily new cases | ❌ | Too noisy; replaced with smoothed version |
new_cases_smoothed | 7-day rolling average of new cases | ✅ | Shows real trends, filters daily reporting fluctuations |
total_cases_per_million | Total cases per million people | ✅ | Allows comparisons between countries regardless of population size |
new_cases_per_million | Daily new cases per million | ❌ | Redundant; replaced by smoothed version per million |
new_cases_smoothed_per_million | Smoothed new cases per million | ✅ | Ideal for normalized trend comparisons |
Column | Description | Keep | Reason |
---|---|---|---|
total_deaths | Cumulative deaths | ✅ | Key indicator of pandemic impact |
new_deaths | Daily new deaths | ✅ | Useful for daily monitoring |
new_deaths_smoothed | Smoothed new deaths | ❌ | Redundant if using both daily and per million metrics |
total_deaths_per_million | Deaths per million | ✅ | Enables normalized cross-country comparison |
new_deaths_per_million | Daily deaths per million | ✅ | Adjusted by population |
new_deaths_smoothed_per_million | Smoothed deaths per million | ❌ | Redundant with existing metrics |
Column | Description | Keep | Reason |
---|---|---|---|
excess_mortality | Weekly % of excess mortality | ✅ | Indicates excess deaths beyond expectations |
excess_mortality_cumulative | Cumulative % of excess mortality | ✅ | Shows prolonged deviation over time |
excess_mortality_cumulative_absolute | Absolute number of excess deaths | ✅ | Gives real volume of excess mortality |
excess_mortality_cumulative_per_million | Excess deaths per million | ✅ | Allows cross-national analysis |
Column | Description | Keep | Reason |
---|---|---|---|
hosp_patients | Current hospitalized patients | ❌ | Redundant; prefer normalized version |
hosp_patients_per_million | Hospitalized per million people | ✅ | Enables fair comparison across countries |
weekly_hosp_admissions | Weekly hospital admissions (raw) | ❌ | Not normalized; better to use per million |
weekly_hosp_admissions_per_million | Weekly hospital admissions per million | ✅ | Reflects burden over time |
icu_patients | Current ICU patients | ❌ | Redundant with ICU admissions |
icu_patients_per_million | ICU patients per million | ❌ | Less useful if ICU admissions are used |
weekly_icu_admissions | Weekly ICU admissions (raw) | ❌ | Same reason as above |
weekly_icu_admissions_per_million | ICU admissions per million | ✅ | Captures severe case trends |
Column | Description | Keep | Reason |
---|---|---|---|
total_tests | Total tests performed | ❌ | Hard to interpret without positives |
new_tests | Daily new tests | ❌ | Too volatile |
total_tests_per_thousand | Total tests per 1000 people | ✅ | Shows overall testing effort |
new_tests_per_thousand | New tests per 1000 people | ❌ | Redundant; replaced by smoothed version |
new_tests_smoothed | Smoothed new tests | ✅ | Tracks real testing trends |
new_tests_smoothed_per_thousand | Smoothed new tests per 1000 people | ✅ | Enables normalized comparison |
positive_rate | Percentage of positive tests | ✅ | Key to assessing under-testing |
tests_per_case | Number of tests per case detected | ✅ | Inversely related to positivity rate |
Column | Description | Keep | Reason |
---|---|---|---|
total_vaccinations | Total doses administered | ❌ | Redundant if focusing on individuals |
people_vaccinated | People with at least one dose | ❌ | Dropped for simplification |
people_fully_vaccinated | Fully vaccinated people | ❌ | Dropped for simplification |
total_boosters | Total booster doses | ✅ | Important for booster analysis |
new_vaccinations | Daily new doses | ❌ | Replaced by smoothed version |
new_vaccinations_smoothed | Smoothed new doses | ✅ | Shows true vaccination pace |
total_vaccinations_per_hundred | Doses per 100 people | ❌ | Redundant with individual-based metrics |
people_vaccinated_per_hundred | One dose per 100 people | ❌ | Redundant |
people_fully_vaccinated_per_hundred | Fully vaccinated per 100 people | ✅ | Key indicator of national coverage |
total_boosters_per_hundred | Boosters per 100 people | ❌ | Redundant |
new_vaccinations_smoothed_per_million | Smoothed new doses per million | ❌ | Dropped for simplification |
new_people_vaccinated_smoothed | Smoothed new individuals vaccinated | ❌ | Dropped by design |
new_people_vaccinated_smoothed_per_hundred | Per 100 people | ❌ | Dropped by design |
Column | Description | Keep | Reason |
---|---|---|---|
population | Total population | ✅ | Needed for all per capita calculations |
population_density | Population density | ✅ | Helps understand transmission dynamics |
median_age | Median age | ✅ | Important for assessing mortality risk |
life_expectancy | Life expectancy | ✅ | General health indicator |
gdp_per_capita | GDP per capita (USD) | ✅ | Socioeconomic indicator |
extreme_poverty | % living in extreme poverty | ✅ | Crucial for assessing access to healthcare |
diabetes_prevalence | Diabetes prevalence (%) | ✅ | Known comorbidity for COVID-19 |
handwashing_facilities | % with access to handwashing | ✅ | Indicator of basic hygiene and prevention capacity |
hospital_beds_per_thousand | Hospital beds per 1000 people | ✅ | Measures healthcare capacity |
human_development_index | Human Development Index (HDI) | ✅ | Overall development and wellbeing metric |
OWID - Vaccinations_manufacturer.csv
DIMENSIONS:
Total rows = 888.361
Total columns = 4
Columns kept = 4
WHAT DOES EACH ROW MEANS?
Each row represents the cumulative number of vaccine doses administered for a specific vaccine brand in a specific country on a specific day.
Column | Description | Keep | Reason |
---|---|---|---|
country | Country name | ✅ | Essential for grouping |
vaccine | Vaccine name | ✅ | Key for comparison between vaccines |
date | Date of administration | ✅ | Required for timeline and trends |
total_vaccinations | Cumulative doses administered | ✅ | Core quantitative indicator of usage |
VAERS - Data.csv
DIMENSIONS (2021 example):
Total rows = 768.353
Total columns = 35
Columns kept = 17
WHAT DOES EACH ROW MEANS?
Each row is an individual report submitted to the VAERS system describing a possible adverse event after receiving a COVID-19 vaccine in the United States.
Column | Description | Keep | Reason |
---|---|---|---|
VAERS_ID | Unique report identifier | ✅ | Primary key to link across VAERS files |
RECVDATE | Report received date | ✅ | Useful to analyze reporting trends |
RPT_DATE | Report creation date | ❌ | Often empty or redundant with RECVDATE |
TODAYS_DATE | File creation date (not event date) | ❌ | Irrelevant for analytical purposes |
VAX_DATE | Date of vaccination | ✅ | Needed to calculate time to symptom onset |
ONSET_DATE | Symptom onset date | ✅ | Crucial to assess reaction time |
NUMDAYS | Days between vaccination and symptoms | ✅ | Already calculated; saves preprocessing steps |
DATEDIED | Date of death (if applicable) | ✅ | Required for severity and fatality timeline analysis |
Column | Description | Keep | Reason |
---|---|---|---|
STATE | US state where the event occurred | ✅ | Useful for regional analysis |
AGE_YRS | Patient age in years | ✅ | Key for age-based segmentation |
CAGE_YR | Age in text format (years) | ❌ | Redundant with numeric age (AGE_YRS) |
CAGE_MO | Age in months (for infants) | ❌ | Not needed if analysis excludes infants or already using AGE_YRS |
SEX | Patient gender (M/F/U) | ✅ | Fundamental for demographic breakdowns |
Column | Description | Keep | Reason |
---|---|---|---|
DIED | Indicates death | ✅ | Key indicator of event severity |
L_THREAT | Life-threatening condition | ✅ | Useful for classification of serious cases |
ER_VISIT | Emergency room visit | ✅ | Acts as proxy for acute impact |
ER_ED_VISIT | Emergency visit (duplicate) | ❌ | Redundant with ER_VISIT |
HOSPITAL | Was hospitalized | ✅ | Helps measure clinical burden |
HOSPDAYS | Days spent in hospital | ✅ | Useful for estimating impact severity |
DISABLE | Resulted in disability | ✅ | Captures long-term adverse effects |
RECOVD | Patient recovered | ✅ | Allows tracking of recovery vs. severity |
BIRTH_DEFECT | Birth defect caused | ✅ | Rare but critical for completeness |
Column | Description | Keep | Reason |
---|---|---|---|
SYMPTOM_TEXT | Free text description of symptoms | ❌ | Unstructured and redundant with codified symptom tables |
Column | Description | Keep | Reason |
---|---|---|---|
LAB_DATA | Lab data (free text) | ❌ | Unstructured; difficult to scale |
V_ADMINBY | Who administered the vaccine (e.g., pharmacy) | ❌ | Operational, not analytically useful |
V_FUNDBY | Funding source (e.g., public/private) | ❌ | Contextual info, not needed for clinical analysis |
OTHER_MEDS | Current medication (free text) | ❌ | Not standardized |
CUR_ILL | Current illnesses (free text) | ❌ | Difficult to interpret systematically |
HISTORY | Medical history (free text) | ❌ | Lacks structure for consistent analysis |
PRIOR_VAX | Previous vaccines received | ❌ | Inconsistent and loosely related to COVID vaccine |
SPLTTYPE | Report type (e.g., VAERS internal) | ❌ | Administrative metadata, not useful for analysis |
FORM_VERS | Form version | ❌ | Technical detail with no analytical value |
VAERS - Symptoms
DIMENSIONS (2021 example):
Total rows = 1.030.219
Total columns = 11
Columns kept = 6
WHAT DOES EACH ROW MEANS?
Each row links up to five symptoms to a specific VAERS report, using the same unique VAERS_ID
. The structure is wide, and each row represents the set of symptoms for that event.
Column | Description | Keep | Reason |
---|---|---|---|
VAERS_ID | Unique report identifier (shared across VAERS) | ✅ | Key for joining with VAERS Data and VAX tables |
SYMPTOM1 | First reported symptom (usually the most severe) | ✅ | Allows analysis of primary adverse reactions |
SYMPTOMVERSION1 | Dictionary version for symptom 1 | ❌ | Internal technical detail, not analytically relevant |
SYMPTOM2 | Second reported symptom | ✅ | Expands scope of adverse event analysis |
SYMPTOMVERSION2 | Dictionary version for symptom 2 | ❌ | Redundant and technical |
SYMPTOM3 | Third reported symptom | ✅ | Helps analyze co-occurring symptoms |
SYMPTOMVERSION3 | Dictionary version for symptom 3 | ❌ | No analytical value |
SYMPTOM4 | Fourth reported symptom | ✅ | Adds clinical context |
SYMPTOMVERSION4 | Dictionary version for symptom 4 | ❌ | Repetitive, not useful |
SYMPTOM5 | Fifth reported symptom | ✅ | Covers up to 5 symptoms per report |
SYMPTOMVERSION5 | Dictionary version for symptom 5 | ❌ | Purely technical, no impact on analysis |
VAERS - VAX
DIMENSIONS (2021 example):
Total rows = 813.441
Total columns = 9
Columns kept = 4
WHAT DOES EACH ROW MEANS?
Each row contains vaccine-related metadata linked to a VAERS report, such as vaccine type, manufacturer, and dose series.
Column | Description | Keep | Reason |
---|---|---|---|
VAERS_ID | Unique report ID | ✅ | Key to merge with SYMPTOMS and DATA tables |
VAX_TYPE | Vaccine type (e.g., COVID19, FLU) | ✅ | Required to filter only COVID-related reports |
VAX_MANU | Manufacturer (Pfizer, Moderna, etc.) | ✅ | Important for brand-level comparison of adverse effects |
VAX_LOT | Vaccine lot number | ❌ | Highly specific, often missing, not suitable for broad analysis |
VAX_DOSE_SERIES | Dose number (1st, 2nd, booster) | ✅ | Enables analysis by dose sequence |
VAX_ROUTE | Route of administration (e.g., IM) | ❌ | Too technical, not useful for impact evaluation |
VAX_SITE | Injection site (e.g., left/right arm) | ❌ | Does not influence severity of reaction |
VAX_NAME | Full commercial vaccine name | ❌ | Redundant with VAX_TYPE and VAX_MANU |

Data Cleaning, Automatization, Merge
Before building the reports, I cleaned each dataset using Python + Pandas in Jupyter Notebook. This included filtering columns, fixing dates, handling missing values, and correcting inconsistencies. For VAERS data, I also automated the cleaning process for all yearly folders and then merged them. Full code is provided above.
1️⃣ FIRST CLEANING OF EACH DATASET STRUCTURE [PYTHON]
OWID Databases
Compact.csv
Column Selection
Dropped 23 columns based on prior evaluation, retaining variables for COVID trends, demographics, testing, vacc. and mortality.Initial Filtering by Country
Removed countries with few or poor data. Selected 64 countries across all continents with meaningful data for comparison.Date Conversion and Filtering
Converteddate
to datetime. Excluded:Dates before March 2020 (pre-pandemic noise)
Dates beyond June 2025 (future reporting errors)
Duplicate Check
Verified there were no duplicate rows after filtering.Missing Data Review
Identified high-NaN columns. Instead of dropping them, I flagged them for later consideration in visual analysis and comparisons.Manual Imputation from Official Sources
Filledhuman_development_index
,life_expectancy
, andhandwashing_facilities
using dictionaries built from UN and World Bank data, mapped by country.Country-Continent Consistency Check
Ensured each country was consistently mapped to only one valid continent.Outlier Detection and Handling
tests_per_case
capped at 500 (higher values replaced with NaN)Negative
reproduction_rate
values replaced with NaNReviewed extreme values across all metrics using
.describe()
Categorical Field Verification
Reviewed fields likecontinent
for category consistency. Confirmed appropriate cardinality and uniqueness.
Vaccinations.csv
Country Filtering
Filtered rows to keep only the countries selected in the main OWID dataset.Date Formatting and Filtering
Converted thedate
column to datetime and filtered rows between March 1, 2020 and June 1, 2025.Duplicates Check
Verified that no duplicate rows were present.Vaccine Categorization
Created a new columntechnology
to classify each vaccine into one of five types: mRNA, viral vector, inactivated virus, protein subunit, or other.Missing Data
Confirmed that there were no null values in any column after cleaning.
VAERS Databases
Data.csv (2021)
Column Filtering
Retained only the 17 relevant columns listed above; dropped the rest.Duplicates & Formatting
Removed duplicates. Converted all date fields to datetime format.Categorical Standardization
Normalized values inSTATE
, replaced"U"
withNaN
, and fixed inconsistent codes.Valid Location Filtering
Kept only valid US states and territories (e.g., PR, GU), removing invalid entries.Binary Fields Cleanup
ReplacedNaN
in binary fields (DIED
,HOSPITAL
, etc.) with"N"
when absence implied «No».Date Range Filtering
Removed records with dates outside a reasonable range (Dec 2020 – Jan 2022).Numerical Outlier Handling
Dropped rows with implausible hospital stays (over 365 days).
Symptoms.csv (2021)
Column Pruning
Technical and non-analytical fields such asSYMPTOMVERSION1–5
andORDER
were removed.Duplicate Removal
Removed duplicates.Missing Data Analysis
Nulls are expected inSYMPTOM2–5
, as not all patients report multiple symptoms.SYMPTOM1
is always present.Validation
All columns were confirmed to be in the appropriate format (int64
forVAERS_ID
,object
for symptoms).Basic Statistical Review
A.describe()
analysis was run to check for range anomalies or structural issues.
VAX.csv (2021)
Column Reduction
Removed irrelevant or overly specific fields that added no analytical value to the vaccine data.COVID-19 Filtering
Only rows whereVAX_TYPE
contained «COVID19» were retained. Once filtered,VAX_TYPE
was dropped as redundant.Duplicate and Null Handling
Removed duplicates. Only ~0.37% of rows had missingVAX_DOSE_SERIES
, which was acceptable and retained.Dose Classification
VAX_DOSE_SERIES
contains values like «1», «2», «3», «7+», or «UNK». Given that «UNK» represents over 15% of the data, these entries were preserved for now.- Basic Statistical Review
A.describe()
analysis was run to check for range anomalies or structural issues.
2️⃣ AUTOMATIZATION OF VAERS DATABASES (2020-2025)
To efficiently process VAERS datasets from 2020- 2025—which share the same structure but vary in size and content—I created three Python scripts to automate the cleaning of each one. This prevents manual repetition and ensures consistency across years. Each script loads all yearly files from its respective folder and applies the exact same preprocessing steps as done manually for 2021. Code above, as always.
Data.csv
For each yearly VAERSDATA.csv
file:
Selected only relevant columns (as listed above).
Removed duplicates.
Converted date fields to datetime.
Standardized categorical fields and filtered valid states/territories.
Replaced missing values in binary flags with «N» where appropriate.
Removed rows with implausible dates or hospitalization durations.
Symptoms.csv
For each VAERSSYMPTOMS.csv
file:
Kept only
VAERS_ID
and symptom columns.Removed duplicates.
Dropped entries with no primary symptom (
SYMPTOM1
missing).
VAX.csv (2021)
For each VAERSVAX.csv
file:
Selected essential columns (e.g.,
VAERS_ID
,VAX_TYPE
,VAX_MANU
,VAX_DOSE_SERIES
).Removed duplicates.
Filtered only COVID-19-related rows (by
VAX_TYPE
).Kept only the first vaccine entry per report (in case of multiple rows per
VAERS_ID
).
3️⃣ MERGE OF ALL THE VAERS DATABASES (2020-2025)
After cleaning all nine VAERS datasets (3 data types × 6 years), the next step was to merge them into unified files to simplify the Power BI workflow. Instead of working with 18 separate CSVs, I consolidated everything into just 3 (code above).

Pre-Power BI: Calculated Columns, DAX
Before diving into Power BI with the dashboard design, I always take the time to plan the analytical structure of the report. Once the datasets are cleaned and I have a good understanding of their structure, I sketch out the key questions I want to answer, the potential dashboards I’ll need to build, and the types of visualizations that might best support those insights. See an example here (don’t laugh please):

As part of this planning phase, I create an initial batch of calculated columns and DAX measures for each dataset. While I often add or fine-tune additional measures later during the visual building process, this early stage preparation helps me work much faster and more consistently.
In total, I prepared over 80 DAX measures and 10 calculated columns across five core datasets. These calculations cover a wide range of analytical needs: from time intelligence and cumulative indicators, to epidemiological ratios, severity metrics, symptom-based outcomes, ranking logic, and composite scores. As you may know, all the code is available above.
Dataset | Calculated Columns | DAX Measures |
---|---|---|
OWID Compact | 4 | 50 |
Vaccinations Manufacturer | 1 | 6 |
VAERS Data | 4 | 13 |
VAERS Symptoms | 0 (used unpivoting) | 5 |
VAERS VAX | 1 | 5 |
Total | 10 | 79 |

Power BI: Data Modeling, Power Query
Once all datasets were cleaned in Python and exported to .csv
, they were imported into Power BI for further processing. At this stage, few final adjustments were made in Power Query. For example, region identification was refined to ensure countries were properly categorized, Decimal formats were adjusted, rounding excessively long decimals or setting percentages.
In the case of the VAERS_SYMPTOMS table, I applied an unpivot transformation to the five symptom columns (SYMPTOM1–5
) to consolidate them into a single “Symptom” column. This was essential to rank the most frequent symptoms. I also created a custom date table (Dim_Calendar
) to allow consistent time-based analysis across all datasets. M code written above.

This calendar table was linked to: OWID_Compact[date] and VAERS_Data[ONSET_DATE] The fact tables are OWID_Compact
and VAERS_Data
, each with many entries per day, so the relationships were one-to-many (from Dim_Calendar
). To keep the data model clean, I hid all technical linking fields (e.g., country
, VAERS_ID
, date
) from the report view except the ones from fact tables. Additional model relationships:
OWID_Compact
toOWID_Vacc
viacountry
(inactive relationship to avoid ambiguity)VAERS_Data
toVAERS_Symptoms
andVAERS_VAX
viaVAERS_ID

Key Considerations, Project Boundaries
While analyzing the VAERS database, it is crucial to emphasize that this dataset must never be used to infer direct causality -actually it’s prohibited and punishable- . The presence of adverse events following vaccination does not imply that the vaccine caused them. The VAERS system is designed to detect safety signals and patterns, not to confirm causal relationships between vaccines and health outcomes.

Additionally, there are important limitations in the availability and accuracy of international data:
Underreporting in low-resource countries: The WHO estimates that some African nations underreport COVID-19 deaths by a factor of 8 to 10, due to limited testing, healthcare access, and incomplete data systems.
Political or social suppression: In countries like Russia, data may be intentionally withheld or manipulated for political reasons, which prevents accurate figures from appearing in open-source platforms like OWID.
Lack of monitoring infrastructure: Many regions lack reliable epidemiological systems, making it impossible to track key metrics such as deaths or case rates.
These limitations can significantly distort indicators like “average deaths per million” or country rankings. As a result, some findings may reflect data availability rather than the true impact of the pandemic, and should therefore be interpreted with caution.

This project was designed without a specific business objective in mind, simply because no stakeholder requested a concrete analysis. Instead of answering a predefined question (e.g., “Should we invest in X?” or “How did sales evolve compared to last year?”), the goal was to build a general and flexible reporting system capable of extracting as much relevant information as possible from the datasets. This approach aims to showcase analytical depth and technical execution, while maintaining clarity and design integrity—avoiding meaningless charts or overwhelming dashboards.

Project Results, Official Cross-checking
This section presents the main findings from the OWID and VAERS datasets, along with comparisons to official sources such as WHO or NCBI. Verifying results with external data is key to detecting biases, validating patterns, and ensuring that conclusions are grounded in reality.
1️⃣ Mortality and Case Comparison by Country | OWID
While some findings may be expected—like the total number of deaths being highest in the U.S., Brazil, and India due to population size —we can start comparing them to total reported cases, where China, the U.S., and India lead. This helps identify countries where high case counts didn’t translate to equally high death counts, such as France.
Cross-checking: Correct ✅ [Our World in Data global COVID-19 death counts]


The Case Fatality Rate metric reveals expectedly higher values in continents like Africa and South America, due to well-known issues such as underreporting, limited healthcare infrastructure, and low testing capacity. However, what stands out is the unexpectedly low fatality rate in many Asian countries, especially India, which ranks 58th out of 64 in hospital capacity, yet reports relatively few COVID deaths.
Cross-checking: Correct ✅ [Our World in Data reported case fatality rates]


This discrepancy strongly suggests systemic underreporting or data inconsistency, rather than actual low mortality. In contrast, countries like Peru (3.44), Bulgaria (2.96), and Hungary (2.64) report high daily deaths per million, which may reflect more transparent and honest reporting.
Cross-checking: Correct ✅ [Our World in Data Mortality]
2️⃣ Infection Spread and Restriction Levels | OWID
Countries like France, Germany, and Austria had the highest new cases per million, showing a strong spread of the virus in Europe, though not necessarily a corresponding death toll.
Cross-checking: Correct ✅ [Our World in Data Cases]


Regarding stringency of restrictions, countries like China, Iran, and India maintained consistently high levels of control throughout the pandemic, sometimes even beyond the critical phases.
Cross-checking: Correct ✅ [Oxford Stringency Index ]
3️⃣ Vaccination and Testing Efficiency | OWID
Surprisingly, Chile had the highest number of fully vaccinated people per 100 inhabitants in 2025, followed closely by China, Malta, Vietnam, and Peru—indicating not just strong campaigns but also high compliance and completion.
Cross-checking: Correct ✅ [Our World in Data Vaccinated]


As for testing efficiency, the number of tests per confirmed case indicates how extensively a country tests its population relative to actual infections. A higher value reflects broader, more proactive testing, even for mild or asymptomatic individuals. In this metric, China clearly leads, followed by New Zealand and Denmark. On the other end, countries like Brazil, Ecuador, and Mexico show some of the lowest test-per-case ratios. Not all countries are represented in the dataset, so these may not be the worst globally.
Cross-checking: Correct ✅ [Our World in Data Testing]
4️⃣ Temporal Evolution of the Pandemic | OWID
The deadliest period was the first quarter of 2021, accounting for nearly 47% of total global deaths (2.74 million), likely due to gatherings around Christmas and the emergence of new variants.
Cross-checking: Correct ✅ [WHO Covid-19 Timeline]


Interestingly, 2022 recorded over twice as many reported cases as 2021 (394 million vs. 158 million), yet mortality rates dropped significantly. This trend reflects the impact of widespread vaccination in 2021, which helped protect against severe disease, even though it didn’t fully prevent infections. In other words, more people got infected, but far fewer died—a clear indicator of vaccine effectiveness on a global scale.
Cross-checking: Clarification ❌ It’s true 2022 saw over twice the cases of 2021, but this surge was due to the Omicron variant and relaxation of restrictions, not because prior vaccination increased infections [OpenVAERS Omnicron]
5️⃣ Vaccine Types and Usage by Country | OWID
Globally, Pfizer/BioNTech was by far the most used vaccine (1.5 billion doses, ~65%), followed by Moderna and Oxford/AstraZeneca. The most used vaccines (Pfizer and Moderna) are mRNA-based, largely due to production simplicity and early rollout.
Cross-checking: Incomplete ❌ Pfizer/BioNTech was widely used, but global vaccine distribution included massive use of Sinovac, Sinopharm, AstraZeneca, and others, so Pfizer’s share was <65% [Global manufacturer dose counts]


Usage patterns varied by country:
- The U.S. used almost exclusively Pfizer, Moderna, and Johnson & Johnson
- Spain also used AstraZeneca alongside Pfizer
- South American countries (Argentina, Peru, Chile, Ecuador) relied heavily on Sinopharm/Beijing and Sputnik V, using inactivated virus or viral vector platforms.
Cross-checking: Correct ✅ [WHO, National Vaccine rollout records]
1️⃣ General Overview and Demographics | VAERS
Disclaimer: These results do not attempt to establish any causal link between vaccines and adverse effects. This is purely a BI exercise based on voluntary reports submitted in the U.S. from 2020 to mid-2025. Interpret with caution.
From 2020 to June 2025, over 807,000 VAERS reports were filed, including 7,660 deaths (0.95%), 5.94% hospitalizations, and 1.67% cases resulting in disability. Notably, 67% of reports came from women, yet they accounted for only 43.8% of total deaths.
Cross-checking: Partially incorrect ❌VAERS reported majority of events from women, aligning with the data, but total VAERS counts for deaths (7,660) and total reports (807k) are much lower than CDC/OpenVAERS official totals (1.5 million+ reports and ~16–38k deaths) [OpenVAERS/CDC official statistics]

Age analysis shows expected trends: older individuals had higher mortality, hospitalization, and disability rates. Surprisingly, hospital stays across age groups ranged from 4 to 6.3 days, more uniform than expected. Also, the time from vaccination to death and symptom onset to death increased with age, possibly reflecting different physiological responses between age groups.
Cross-checking: Correct ✅ [VAERS demographic analysis]

2️⃣ State-Level Differences in the U.S. | VAERS
When analyzing U.S. states, some showed much higher mortality rates: Kentucky (4.05%), South Dakota (3.54%), and Tennessee (2.79%)—likely due to poorer hospitalization management or systemic issues.
Cross-checking: Unverified ❌ These state‑by‑state VAERS mortality percentages are not confirmed by official data, and attributing this to systemic hospital issues is speculative [VAERS lack of state-level corroborating reports]

3️⃣ Symptom Frequency and Severity | VAERS

The most commonly reported symptoms were headache, pyrexia, fatigue, pain, and chills, each linked to a mortality rate below 0.6%. Some rare symptoms had 100% mortality—but were based on 1–2 reports, often marked as “death” itself, and were excluded from visualizations. Graphs focus on symptoms with more than 1,000 reports for meaningful insights.
Cross-checking: Correct ✅ [VAERS symptom frequency]
Graphs focus on symptoms with more than 1,000 reports for meaningful insights:
- Most lethal symptoms: “unresponsive to stimuli,” “chest X-ray abnormal,” “intensive care,” “pneumonia,” and “hypoxia.”
- Most associated with disability: “cerebrovascular accident,” “MRI head abnormal,” “Guillain-Barré syndrome.”
- Most linked to hospitalization: “pulmonary embolism,” “cerebrovascular accident,” “chest X-ray abnormal,” “troponin increased».
Cross-checking: Correct ✅ [VAERS reports]

4️⃣ Mortality and Severity by Vaccine | VAERS

Most VAERS reports relate to Pfizer and Moderna, as expected from their usage rates in the U.S. However, these were not the vaccines with the highest death or disability percentages.
Some vaccines with fewer total reports had worse outcome ratios. These insights are left for personal interpretation, yet visuals are provided without numeric conclusions.
Cross-checking: Correct ✅ [CDC vaccine safety findings]

Conclusions
This project has been both a technical and personal challenge. It pushed me to apply advanced data cleaning, modeling, and visualization skills using real-world health data. Despite its complexity and lack of a predefined question, I aimed to extract as much relevant insight as possible while maintaining clarity and analytical rigor. I’m proud of the result—not only as a data analyst, but as someone committed to turning raw data into knowledge.
Any feedback is welcome, and if you’d like to connect or collaborate, feel free to contact me!