I figured it would be a good exercise to see if that could be replicated/improved. Step 1 was to find the data. Luckily they provided a link to github where you can find the data:
navigate down and you can download a csv for each day. Now, I dont really want to download a file manually every day, so set out to automate that.
First step was to get a list of csv files. Easy enough, connect to a web page and do some transformations and get the following:
Now that we have the file names, how do we get the data associated with those files? If you click into one specific file there is a button for “raw” which does:
Armed with that I can create a function that will change the last part of that url to that of our file list.
= (FileName as text)=>
let
Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"&FileName),[Delimiter=",", Columns=6, Encoding=65001, QuoteStyle=QuoteStyle.None])
in
Source
the file name will be called back in our 1st table. Invoke that function and you get a table for each csv file
For those of you just interested in the USA data Microsoft has this which I have embedded into my SharePoint site for all users. When embedding it doesn’t required a license to consume
All - Thanks for sharing. This info is very informative. Does anyone have a good data source for shelter in place orders in the United States, including start date and expiration date? If someone has also normalized essential businesses for each shelter in place order that would be huge! let me know, thanks!
Thank your for the information, But I can’t find a solution to fix the date format of each CSV file(It’s not consistent and terminate at some point ).
Do you have any idea please.
@wesentadesse99
I see what you mean. It seems when the data became more US centered the csv files changed. The dates are an easy fix (one of the last steps below) what I found is that the column names were changing as well and we were missing pulling in the all the data. Here’s how I solved for that issue:
Get a unique list of column names from all the files
Create a list of the columns we want to keep. This is where we filter out if the list contains “Not Used”
We can create a function putting all of this together:
= (FileName as text) =>
let
Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"&FileName),[Delimiter=",", Columns=11, Encoding=65001, QuoteStyle=QuoteStyle.None]),
PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
NewColNamesList = Table.SelectRows(list_NewColNames, each List.Contains(Table.ColumnNames(PromoteHeaders),[CurrentColNames])),
ConvertToRecord = Table.TransformRows(NewColNamesList, Record.FieldValues),
Rename= Table.RenameColumns( PromoteHeaders, ConvertToRecord),
SelectColumns = Table.SelectColumns(Rename, NewColumnsToKeep, MissingField.UseNull)
in
SelectColumns
NewColNamesList is selecting only the columns that are avaialbe to rename. Since the earlier files will have less columns, need a list that is applicable to the current list of columns
Convert to record creates a record for the old and new column names. Then we can use that to rename
The select columns uses #3 as to what columns we want to keep. You need to use “MissingField.UseNull” or it will error out
You then call the function in the Files table:
replace the “.csv” with blank and then convert to date and that will become your report date, which is why we didnt use the “LastUpdate” column in the original data.
Then set your data types and other cleanup and should be good.
I had a similar idea and I developed one. I had a sneaky feeling the John Hopkin’s Dashboard was developed using Power BI. I understood most of the functionalities of the model but some eluded me. I first thought of developing the dashboard using PYTHON but soon realized its much easier using Power BI and the visualization is much more compelling. I however got the data using a PYTHON SCRIPT. I found this much easier to, I used the following script.