Latest Enterprise DNA Initiatives

Power BI Challenge 11 - Covid 19 Reporting from Greg, Brian, Jarrett, and Mudassir

Here’s their entry for Power BI Challenge 11. @Greg, @BrianJ, @JarrettM, @MudassirAli , feel free to add other details of your work.

image

image

image

image

image

Here’s the link to the report:

And here’s how they described it:

Hi All.

Here is a collaborative effort from a group of experts. We wanted to go through the process that many of may experience in your work environments, and see what benefits and limitations arose. As well, part of the reason for the collaborative effort was to do a dry run for a future course/webinar on collaborative Power BI development that we’ll be putting together for release later this year.

So, without further ado, here is the collaborative submission from JMAP (Julius, Moore, Ali, and Philps).

(GP: I was the lucky one who got to make our post, but others did an overwhelming portion of the development, so …)

We’re still trying to figure out the best way to share our experiences in this group development effort, and more will become available shortly.

Thanks again to Enterprise DNA for hosting and administering these challenges, and congratulations to the Enterprise DNA community for all your efforts. Bon chance!

To learn about the real-life scenario presented for the challenge, be sure to click on the image below.

power-bi-chal-1

4 Likes

This is such a great idea, I am looking forward to your write up(s?) on how you divided the work, what worked well, and what didn’t. The report itself is very well done! My only observation is that summer isn’t the US summer months in other places, like Australia.

1 Like

All,

Introduction
For a while, we’ve been planning to do a joint entry to explore the dynamics of collaborative Power BI development, in order to provide the Enterprise DNA community with a set of practical, tested recommended practices. Our ultimate plan is to use Analyst Hub as the primary mechanism to facilitate this collaboration, but the first time through we thought we would focus on the broader elements of learning to work together as a team, since each of us has only ever been the sole developer on either our real life reports or our Data Challenge entries.

Collaborative Effort
One of the goals of our approach to this challenge was to try to mimic what might be a not-uncommon situation many people may find themselves in when developing a Power BI report, namely that there are many people involved. That, and the inherent logistical difficulties brought about by the fact that only one person can work on a Power BI file at a time, led us to establish a common location for collaboration, in this case we used a OneDrive master folder. We created a shell collaboration Word document and PowerPoint presentation that we used to toss around ideas, then began with the data loading and transformation. (You can find more details about the specifics of the data prep below.)

One team member did most of the data loading and data prep work, then posted their solution to the OneDrive site with a version number (to allow for easy identification of the latest version) and their initials (to prevent other users from accidentally overwriting their contributions). Once the initial data prep and modelling was drafted, each team member took a copy and we each created possible visualizations, which we then iterated together until a path forward with consensus appeared. We then more specialized into the particular strengths of each team member, and from here on always worked on the common file, and one-at-a-time added our enhancements and suggested changes.

This iteration was both time consuming and communication-intensive, and clearly exposed one of the current main difficulties in Power BI development at this time: namely that changes from different authors can’t be merged.

So, in the end, we “ate” the difficulty and iterated and communicated extensively, ending-up with a Power BI candidate report with many pages, most of which were hidden and not included in the final submission (or were interim “work” pages).

Choice of Data Story
Story-telling is essential in dashboards as it helps to engage the users and communicate the requirements easily without having to put so much effort into digging out the essential points. Therefore, we came up with a story “How have countries dealt with COVID-19?” and based our workings in communicating the story with different visuals, narratives, regression models, dynamic report level tool-tips, navigations, maps and so on.

Data Prep and Data Modeling
When we first looked at the data for this challenge, it seemed fairly simple and that the data modeling step would go quickly. However as we began to dig into the data in detail, we realized that JHU had made at least three highly questionable choices in their management of this data that increased the need for aggressive data cleaning:

1) Use of Country Name, Rather Than ISO Code to Uniquely Identify Each Country – there is no recognized standard for country names, however ISO codes (both the three and two character versions) are a recognized international standard. The absence of ISO codes severely limits the ability to add additional important information to your country dimension table. Fortunately, we found someone on GitHub who had already gone to the trouble of matching the JHU country names to the three character ISO codes, saving a ton of time and allowing us to create through a simple merge a “universal adapter” field that allowed easy buildout of the country dimension table, detailed further below. From the three character ISO code, we added the two character ISO code now making it possible to incorporate almost any other country level data source into this analysis.

2) Reporting of Daily Cumulative Rather Than Marginal Totals - We’ve never seen data reported in this way before at the raw data level, and with good reason – it makes it very difficult to work with as many on the forum indicated in their first dig into this data. To correct that, we employed the “Feldmann Shift” (named for Imke Feldmann of the BIccountant blog, who was the first person we ever saw employ this technique), explained in this thread. The resulting marginal daily data allowed us to use all the standard measures and techniques to which we are accustomed, as well as some more advanced statistical modeling discussed later.

3) Use of Zeros Rather Than Blanks to Indicate Missing Data – this is a convictable crime against the laws of data management. Once the data was normalized to daily values, it became apparent that in a number of instances the data was coded with zeros to indicate missing values rather than blanks – which led to the presence of some enormous outliers. Once we recognized what had been done we treated the zeros as missing values and filtered them from the dataset.

4) Reporting of Data at the Province/State Level – Unlike the other three above, there is nothing inherently wrong with doing this, but it just makes adding additional country level data to the dimension table difficult. Thus we used the Power Query grouping functions to roll the data up to the country level only to make it easier to merge with country level demographic, geographic, economic and social data critical to our analysis.

Once the data were cleaned, we configured them into a standard star schema model, with a snowflake off the Dates table to allow slicing by period. The odd wrinkle in our data model is the presence of the three additional sub fact tables, referenced from the main fact table and built specifically for the R multiple regression analysis discussed below.

Building out the Countries Dimension Table
The work that Brian did recently on the “Where in the World Is Enterprise DNA?” mapping effort was incredibly timely for this challenge, since we were able to reuse almost all of the country level dimension table fields he developed for that report for this one. In addition, we added the following data used for our regression analysis:

Population = Population data by country, 2019 (source: World Population Review)

Income = Adjusted net national income per capita, 2018 (source: World Bank)

NumNeighbors = Number of countries with a shared physical border to the selected country.

+1 added to value for each country to prevent log(0) undefined error for island countries.

Democracy = Democracy Index value, 2019 (source: The Economist Intelligence Unit)

Corruption = Corruption Perception Index value, 2019 (source: Transparency International)

Design Strategy and Elements
It’s not always easy to choose a design especially if you plan on including Maps in Power BI reports. The first biggest challenge was to integrate it well with the background so it looks visually good and the second one is to leave enough space for other visuals/analysis in the report. For this, we took a screenshot of the Map and used it as a background and used the same dark color theme as was in the Map.

After this, the Maps blended well with the report but what about other visuals? The first visuals that can be seen are the card visuals at the top that show the number of cases by different status. To keep the report as clean as possible, these cards are also used as navigation to filter the report by different status. The different colors are used for Confirmed, Recovered & Death cases for better visibility in the report. We also added the stacked Bar Chart with different color for each status to invite the users to click on it but when they do, they will see the navigation option created with a blank button. To get this effect, we inserted a blank button on top of the card visuals, used the background option in format pane and under hover, added the same background as we used for the report. Then used the Button Text option, selected hover from drop down and inserted the text “Click to see the Confirmed/Recovered/Death Cases”.

The main part of the reports are the tooltips that are packed with information and trends.The first static tool-tip is added to map as a starting point for the user to see the outliers. After seeing these outliers the users can filter the data to see different types of information and trends. This tooltip shows the Top 10 cumulative cases by country after reaching the first 1000 cases. Why the first 1000 cases? The reason is many countries started acting after the cases reached a particular number. Moreover, the tool-tip also shows which season had the highest number of cases as it was necessary given the cases are analyzed by the top officials by seasons. There were talks of a second wave already in the fall so we thought it would be a good idea to show some different insights.

Another tool tip in the map is the dynamic tooltip that shows the trend of the cases in different months by country. The best option we found was to use the small multiple feature in line chart to show the trend.

We presented the country flag and name, as well as a dynamic textbox (smart narrative using static text and dynamic measures) to describe the country’s particulars.

Main Map
Like many other entrants in this challenge, we started our mapping effort using a circle map based on latitude and longitude. While this is a great approach if you have very detailed, city level data we felt that it did not depict country level data as well as the choropleth map (a filled shape map).

To implement the choropleth map, we used the 1.9 beta version of the custom mapping visual Icon Map. The default shape map in Power BI is very limited in its configuration options, and Map Box while it looks great does not support report page tooltips – a must have for our analysis. Thus, we relied on Icon Map at the recommendation of EDNA GIS guru @Paul. This worked great, using the Well-Known Text (WKT) option to define country boundaries. More detail about this approach can be found in Brian’s writeup of the “Where in the World Is Enterprise DNA?” Mapping effort.

1000 Case Tooltip (see screenshot above)
The best part about this tooltip is that it contains a gradient line chart designed in Charticulator. It shows the color getting darker as the number of cases increases over time. Moreover, at the end of the line, it contains the name of the country instead of showing it with Legend. We added a calculated column to get the number of days by country when the case was first reached to 1000 and used that as a starting point on the X-axis. With some sophisticated DAX, a measure can be created instead of a calculated column but went ahead with the calculated column due to time constraints.

Seasonal Tooltip (see screenshot above)
This tooltip was designed to keep in mind the seasonality of the cases therefore chose the months on x-axis so the users can see the trends by both months and seasons. Season names aren’t mentioned explicitly as it’s obvious which season is in which months. Moreover, the smart narrative visual also shows which Month and Season has the highest number of cases.

Detailed Trend and Ranking Tooltip (Scatter Chart)
In this tooltip we presented again the country flag and name, along with select statistics (confirmed cases, recovered cases, % recovered, active cases, deaths). We presented both daily and cumulative data in column charts, with a 2nd Y-axis on the cumulative chart as the deaths count is at a much different scale from the confirmed and recovered counts. Finally, we included a performance section at the right listing the country’s global rank for deaths, confirmed, and recovered cases, then again presented the rankings normalized by the country’s population.

Multiple Regression Analysis and Residual Plots
At the heart of our data story question “How Have Countries Dealt with the Global Pandemic?” is the need to be able to meaningfully compare the performance of countries against each other or against a common standard. Toward this end we constructed a multiple regression model that attempted to explain the variance in global deaths, confirmed and recovered. With more time we could’ve done additional research and specification testing to improve the model, but what we came up with seemed pretty good for the amount of time available. Here’s the model specification and key results for deaths, though confirmed and recovered follow nearly identical patterns:

Here are the major takeaways from these results:

  • the model accounts for 60%+ of the variance in global deaths due to the virus. This is actually substantially higher than we expected given the limited time available for research, model creation and specification testing – a process that in many cases can take weeks or months to do thoroughly. There is evidence from the resulting statistics of at least one key variable missing from the specification, which given the complexity of the dynamics related to virus spread and deaths is not surprising
  • all the estimated model coefficients are significant at any threshold p level chosen, and are of the expected sign
  • other than the aforementioned missing variable issue, the diagnostic plots do not show any other obvious modeling flaws

Once we constructed the model and estimated it, we plotted the results separately for deaths, confirmed and recovered, using predicted values for each country on the x-axis versus residual values (actual minus predicted) on the y-axis. This allowed us to see, controlling for all the variables in the model, which countries performed better than expected and which performed worse. In most scenarios, where you are modeling a positive outcome, being above the zero line (i.e., actual exceeding predicted) is considered a “good” outcome and below the line is considered a “bad” outcome. However since we are modeling negative outcomes, the interpretation is reversed and being below the line is considered “good” (note that recovered is still treated in the modeling as a negative outcome since it is so highly collinear with Confirmed)

Mechanically, we estimated the model using a magical feature of Power BI referred to as the “Dataset” call, which allows you to invoke R code in the middle of a query and then continue postprocessing the results in Power Query within that same original query. This has two tremendously beneficial outcomes: 1) it gives you access to 12,000 custom R packages designed for every type of advanced analysis imaginable; and 2) it allows you to skip learning 95% of the R data processing routine and lets you leverage your knowledge in Power Query instead. Brian is obsessed with this technique, and will be doing a separate post at length explaining it in detail for those interested

The bottom line of all this is that this section of the report allows the user to interactively explore which countries exceeded their predicted values and which fell short leading to some interesting and unexpected results and allowing for the meaningful comparisons between countries that we were originally seeking.

Conclusions/Thoughts on Collaborative Development
We all agreed that despite the challenges and limitations inherent in using Power BI for collaborative report development, that working closely together, seeing each other’s processes up close on a daily and hourly basis, leveraging the diverse expertise that each of us brought, experiencing the teamwork and camaraderie of accomplishing something difficult under intense time pressure, was a fantastic experience for each of us, and that the final product produced was better than anything any of us could have done individually. We hope you think so too.

Here are some of our specific observations and recommendations regarding the collaborative development process:

  • Most of us have a set, well-established process that we use for our own Power BI development. Be willing to let go of your normal process in order to maximize learning from the processes and perspectives of others.

  • Assess the strengths of each team member, and collectively choose clearly defined roles based on those strengths at the start of the project. This will help avoid getting in each other’s way.

  • Establish and all agree to follow a well-defined process for check-in/ check-out of the PBIX file. Work backward from the due date/time to determine who should have the master file when and for how long.

  • Tabular Editor is your friend. Once the data prep/data model are complete, it will allow multiple people to work on DAX measures while the file is checked out, and then easily transfer those measures back into the master file.

  • At an appropriate point, agree to the data story you collectively want to tell. A sizable team likely will have far too many interesting ideas to fit all of them into a manageable report. Use the story to guide which analyses are key and which analyses are tangential. Be willing to let go of the tangential ones, even if they are interesting.

  • Be aware that its likely collaborative development effort will take more time than an individual one (when you are the one fulfilling all roles, there’s no iteration required when communicating, for one)

As always, thanks to @haroonali1000 for leading a fantastic challenge and to Enterprise DNA for building and supporting the incredible community around these initiatives.

We look forward to a second iteration of this collaborative process in the future, using Analyst Hub as the primary planning and collaboration tool.

Thanks for reading, and congratulations if you survived all the way to the end of this writeup.

The JMAP Team (Julius, Moore, Ali, Philps)

15 Likes

All,

Here are my thoughts in regarding to this collaborative effort for this challenge.

I was a part of the team for this entry, but more as an observer than a contributor. For a multitude of reasons, I was not able to contribute much to this entry but learned far more from just observing. It was great seeing the interactions as the entry progressed. Each of the team members have their own way of doing things, and that was evident before we even started the project. I thought there would be more disagreements between the team members, but each offered their thoughts, and came to an agreement on most of the decisions. Mudassir was the take charge person of the team. I think it was probably harder from him, than the others when it came to delegating others to do things. Greg & Brian were great role players and helped keep everything in sync. Greg, being of the Engineer mindset, was the main person that kept things organized and on schedule. Brian is the ultimate team player, doing his best to make sure that things were in harmony between everyone as the entry progressed. It was very hard to observe as the others developed the entry, but grateful that I was able to be a part of the team from afar. My initial thoughts as we started the challenge was that I would learn more from the different skills that each of the team members brought to the table, but in the end, I really enjoyed getting to know each of their personalities, and how they interact with others. Moving forward, I would suggest not only doing a collaborative effort for one of the challenges but having someone on the team be more of an observer than a contributor. I think that it is going to be hard for most of us who develop Power BI solutions to be an observer, but it will be very rewarding if you are willing to give it a go!

Cheers
Jarrett

12 Likes

@datazoe Great observation. Will keep that into mind when designing the COVID dashboard again :stuck_out_tongue_winking_eye:

3 Likes

I volunteer for the observer role for that redesign. :smiley:

7 Likes

Seriously this is so good it’s quite astounding.

Everything here from the way you have selected specific visualisations. How you have arranged all of your visualisations. How you have built in a range of different information that a consumer can dive into in a dynamic way. The list goes on and on for me.

Easily one of the best Power BI reports I think I’ve ever seen and just highlights once again how incredible powerful Power BI is as a visualization and storytelling tool.

I think how you have used your tooltips is out of this world as well. There’s so much additional information that you’ve enabled the user to review and it looks so compelling it’s absolutely crazy.

Such a pleasure to consume and evaluate if you were making decisions. This is what they should be showing on CNN and all the news channels to highlights what is actually going on with covid around the world.

Maybe we can get it there somehow let’s let’s collaborate on how we can distribute this more widely!

Well done expert team!

Sam

8 Likes

This submission was really WOW from so many points of view! It is such a joy to explore this report made by some of the best experts out there! What really impressed me was the collaboration. I can only imagine how difficult it was to build such an outstanding report with many ideas from all of you! I think this experience is unique, and it has enormous potential value for businesses out there (especially in a COVID world). In terms of collaboration the conditions were not ideal! Different countries, age, experience, culture, personality & tight schedule for a submission. Wow! I can only be admirative of your work, your collaboration and organization that fueled the creativity and did not limit it! The limits of Power Bi were once again broken with this report! I hope to see this report broadly distributed. People from Microsoft need to know what great tool they created ! :slight_smile:
Congrats! :clap: :clap: :clap:

6 Likes

Woow! I’m speechless. I mean everything is there…The UI visualization layout representation of data is clean, concise and descriptive. It is also truely esthetic and pleasant to the eye. A true piece of art.:ok_hand: Now that’s what I call a perfect PBI example of professional collaboration.:clap::clap::clap:

5 Likes

Sam,
I think you should be the observer because you can easily understand what the other experts have done to come up with this and then are able to cascade it to the rest of the community. You really have the gift of teaching…and yes, it’s a gift. You explain everything so clearly and concisely that it becomes easy to understand. Well done to the experts (JMAP) for this showcase! This is definitely a Power BI “monalisa”.
Harvey

3 Likes

This is so amazing and I am in love with this stunning cool dashboard the way it has been built a truly beautiful masterpiece. I echo @DChamberland “a perfect PBI example of professional collaboration”.

May I request in future if possible there is a video made on this masterpiece work and added to the course structure along with the supporting files.

Cheers
Afzal

FYI, @afzalshariff

Each Power BI challenge is also on the Course learning center

The raw data and powerbi files are also there that can be downloaded.

Thanks
Keith

Hi @Keith

I am unable to find the final supporting source files in the Power BI Challenge Showcases for this Report which are listed in the image below -

Covid-19 Reporting Source files

Cheers
Afzal

know idea on that looks like Brian harddrive.

@Keith - Yes. Looks like these files are on Brian’s computer, I will try requesting from Brian.

@BrianJ - Thanks for sharing the “countries_simplified.csv” file in Where in the World is Enterprise DNA?

Requesting if you may able to help us with the rest of source files which are referenced to your local drive.

Covid-19 Reporting Source files

Cheers
Afzal

@afzalshariff ,

Thanks so much for the kind words. That collaboration with @MudassirAli , @Greg , and @JarrettM was probably my all-time favorite Power BI experience. They are all ridiculously talented (and great guys on top of that), and we all got to do what we each do best on that project.

My one big disappointment on that project was the “Mastermind” session we did with @sam.mckay , which was focused on how we built that report. We had connection problems during that session and we couldn’t hear Greg, who was absolutely integral to the design and execution of the report. Still, you may find value in viewing this video:

If there are any supporting files you’d like to see, just let me know and I’ll be glad to dig them out.

  • Brian
1 Like

Hi @BrianJ ,

Thanks for taking your time to reply my post and for sharing the information by considering my request.

I will go through this session video and this will definitely help me understand the dashboard and clear my queries.

Earlier this evening, I was trying to understand the dynamics by navigating into the power query where I was facing errors due to the missing source files/mapping.

It would be really helpful if all the source files related to this report are made available under the Learning center where the report is showcased Power BI Challenge 11 - COVID 19 Reporting as this may help to deep dive and take away some techniques to replicate if there is any need in other projects for the users who visit this dashboard.

Cheers
Afzal

@afzalshariff ,

Thanks very much for the suggestion – I will definitely package up all my supporting data files into a ZIP and ask the @EnterpriseDNA team to include them with our DC #11 Showcase.

This is one of the challenges of doing an entry with a team - since generally only one person can work on the file at a time, it’s a continual relay race to try to finish your next portion as quickly as possible to get it to the next person to allow them to work on their parts. Typically, I would try to clean up the supporting data files a bit better, but I remember on this one we were going right down to the final day to complete, frantically passing the file back and forth in order to finish.

If there was one thing I can change about Power BI, that would probably be it – somehow to allow multiple people to work simultaneously on a report and then do a code merge.

Thanks though for your interest in our report – I will get you the supporting files shortly.

– Brian