COVID-19 Update by London Borough


x-axis: latest 7 day average for new daily lab-confirmed cases per 10000 person

y-axis: the 30 day increase of the new cases, measured by log ratio
The closer to the bottom left the better the borough is doing.

I then created a distance metric between the borough and Greenwich as a measure of severity:
Severity = Average(Percentile Rank along x-axis, Percentile Rank along y-axis)
which I then set up as a dependent variable.
  • Some suggests there maybe clusters related to Eid festival. However there is actually a negative and insignificant coefficient when I regress vs. ethnicity data. (1)
  • Maybe wealth can be a proxy for oversea mobility and non-white-collar-job. If I use house price (link in the code) as a proxy to wealth, and result again is very weak.
  • No relationship with population density of the borough either.
Can you spot any pattern? 

(1) https://data.london.gov.uk/dataset/detailed-ethnicity-by-age---sex-ward-tools---2011-census--

Full code below:

library(data.table)
library(ggplot2)
library(ggrepel)
data_url <- "https://c19downloads.azureedge.net/downloads/csv/coronavirus-cases_latest.csv"
raw_data <- fread(data_url, check.names = TRUE)
#Useful to filter out only London councils and proxy wealth
#https://data.london.gov.uk/dataset/average-house-prices
house_prices_data <- fread("land-registry-house-prices-borough.csv")
#https://data.london.gov.uk/dataset/land-area-and-population-density-ward-and-borough
pop_density_data <- fread("housing-density-borough.csv")
pop_density_data <- pop_density_data[Year == 2020,
                                     .(population = mean(Population),
                                       density = mean(Population_per_square_kilometre)),
                                     by = "Code"]
london_data <- raw_data[Area.code %in% house_prices_data[,unique(Code)] &
                          Area.type == "utla",,]
london_data[,Specimen.date := as.Date(Specimen.date)]
london_data <- merge(london_data,
                     data.table(Specimen.date = seq(
                       min(london_data[,Specimen.date]),
                       max(london_data[,Specimen.date]),
                       by = "1 day"
                     )), all = TRUE, by = "Specimen.date")
setkey(london_data, Specimen.date)
setnafill(london_data, type = "const", fill = 0,
          cols = c("Daily.lab.confirmed.cases"))
london_data[,roll_mean := frollmean(Daily.lab.confirmed.cases, n = 7, align = "right"),
            by = "Area.code"]
#Exclude last 3 days due to lag
london_data_summary <- london_data[Specimen.date <= Sys.Date() -3,
  .(council = Area.name[1],
    latest_avg = roll_mean[.N],
    rate_of_new_cases = log(roll_mean[.N]/roll_mean[.N-30])),
by = "Area.code"]
london_data_summary <- merge(london_data_summary, pop_density_data, by.x = "Area.code", by.y="Code")
ggplot(london_data_summary, aes(x = latest_avg/population * 10000, y = rate_of_new_cases, label = council)) +
  geom_point(color = "#ff0000") +
  geom_text_repel() +
  labs(x="Latest 7 day average per 10000", y = "Rate of increase (log ratio) over past 30 days") +
  theme_bw() +
  ggtitle("London: Latest 7 Day Average on New Cases and its Rate of Increase by Councils")

#Statistical test#
rate_of_increase_ecdf <- ecdf(london_data_summary[,rate_of_new_cases])
daily_increase_ecdf <- ecdf(london_data_summary[,latest_avg]/london_data_summary[,population])
london_data_summary[,severity := (daily_increase_ecdf(latest_avg/population) + 
                                    rate_of_increase_ecdf(rate_of_new_cases))/2]
summary(lm(severity ~ population + density,
              data = london_data_summary))

Comments

  1. Excellent work! As you are now part my my collection of 230+ blog posts that have used R to analyze covid-19 data, I would like to email you the analysis I am doing. Would you mind sharing your email address? Thank you

    ReplyDelete

Post a Comment