Getting Data from Online CSVs (Covid-19 Data)

First off, hope everyone is doing well in this very interesting times. As i was watching the news, I kept seeing this dashboard from Johns Hopkins

I figured it would be a good exercise to see if that could be replicated/improved. Step 1 was to find the data. Luckily they provided a link to github where you can find the data:

navigate down and you can download a csv for each day. Now, I dont really want to download a file manually every day, so set out to automate that.

First step was to get a list of csv files. Easy enough, connect to a web page and do some transformations and get the following:

Now that we have the file names, how do we get the data associated with those files? If you click into one specific file there is a button for “raw” which does:

Armed with that I can create a function that will change the last part of that url to that of our file list.

= (FileName as text)=>
let
    Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"&FileName),[Delimiter=",", Columns=6, Encoding=65001, QuoteStyle=QuoteStyle.None])
    
in
    Source 

the file name will be called back in our 1st table. Invoke that function and you get a table for each csv file

Expand out and do some cleaning up and you are good.

Final table

Then from there just the normal dax and dasboarding type things, but the data will update accordingly.

Here’s the file:
COVID 19 Data.pbix (13.3 KB)

-Nick

5 Likes

For those of you just interested in the USA data Microsoft has this which I have embedded into my SharePoint site for all users. When embedding it doesn’t required a license to consume

Guy

1 Like

All - Thanks for sharing. This info is very informative. Does anyone have a good data source for shelter in place orders in the United States, including start date and expiration date? If someone has also normalized essential businesses for each shelter in place order that would be huge! let me know, thanks!

Nick

Dear @Nick_M

Thank your for the information, But I can’t find a solution to fix the date format of each CSV file(It’s not consistent and terminate at some point ).
Do you have any idea please.

Best,

@wesentadesse99
I see what you mean. It seems when the data became more US centered the csv files changed. The dates are an easy fix (one of the last steps below) what I found is that the column names were changing as well and we were missing pulling in the all the data. Here’s how I solved for that issue:

  1. Get a unique list of column names from all the files

  2. Using those columns ( just copied them) create a new table (used the “Enter Data” option) to what I want the new name to be:

  3. Create a list of the columns we want to keep. This is where we filter out if the list contains “Not Used”
    image

  4. We can create a function putting all of this together:

 = (FileName as text) =>
let
        Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"&FileName),[Delimiter=",", Columns=11, Encoding=65001, QuoteStyle=QuoteStyle.None]),
        PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        NewColNamesList = Table.SelectRows(list_NewColNames, each List.Contains(Table.ColumnNames(PromoteHeaders),[CurrentColNames])),
        ConvertToRecord = Table.TransformRows(NewColNamesList, Record.FieldValues),
        Rename= Table.RenameColumns( PromoteHeaders, ConvertToRecord),
        SelectColumns = Table.SelectColumns(Rename, NewColumnsToKeep, MissingField.UseNull)
        
      in
        SelectColumns

NewColNamesList is selecting only the columns that are avaialbe to rename. Since the earlier files will have less columns, need a list that is applicable to the current list of columns

Convert to record creates a record for the old and new column names. Then we can use that to rename

The select columns uses #3 as to what columns we want to keep. You need to use “MissingField.UseNull” or it will error out

You then call the function in the Files table:
image

replace the “.csv” with blank and then convert to date and that will become your report date, which is why we didnt use the “LastUpdate” column in the original data.

Then set your data types and other cleanup and should be good.

COVID 19 Data.pbix (483.5 KB)

3 Likes

I havent come across anything like that yet. I’m sure theres something out there, but I havent really looked to hard.

If anyone has followed my crazy thinking here, been able to update with the most recent CSV without any issues. :slight_smile:

1 Like

Amazing Nick, well done.

I honestly though about the same thing over the weekend!

Sam

I had a similar idea and I developed one. I had a sneaky feeling the John Hopkin’s Dashboard was developed using Power BI. I understood most of the functionalities of the model but some eluded me. I first thought of developing the dashboard using PYTHON but soon realized its much easier using Power BI and the visualization is much more compelling. I however got the data using a PYTHON SCRIPT. I found this much easier to, I used the following script.

import pandas as pd
import numpy as np
death_df = pd.read_csv(‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv’)
recovered_df = pd.read_csv(‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv’)
country_df = pd.read_csv(‘https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv’)

1 Like