Power BI- import one large table or ten smaller tables

Hello

I work for an international charity and we work with poor older people around the world affected by disasters such as earthquakes, floods, civil war and epidemics such as COVID-19. We conducts a digital needs assessment (survey) to determine their most urgency needs, challenges so that we can provide appropriate support

I am just starting with PowerBI. We normally we do our assessment data analysis with ACCESS and EXCEL but want to move over to Power BI.

We conduct roughly 14-18 assessments annually and the questions are almost always the same so it is possible to do analysis across multiple countries.

The assessment data table has 228 columns. The data is currently analysed in 10 themes namely Health, Disability, Wellbeing, Food, Water and Sanitation, Shelter, Income and Debt, Safety, Access to services and complaints/consultation

Most of the data is text in the form of yes/no questions or multiple text options eg “never, almost always, always etc but there is some numeric data.

Analysis must allow the data to be always sliced by: Age, gender, location, disability, Alone (living alone) etc

In Excel we do many complex calculations for example we might calculate older people who live alone, on chronic medication, medicine about to run out and access to services. For this we use an EXCEL logic formula that returns code words for only the critical cases.

In PowerBI would would use key measures to do the calculations

My questions are as follows

  1. Do I use just one table with 228 columns in PowerBi or do I use one workbook with 10 sector tables with fewer columns per table but linked by the survey ID number? Please bear in mind the various slicers that might need to be applied to the data e.g. age, gender, disability etc.
  2. Do is use a separate PowerBI file for each country survey and then when I want to analyse multiple countries I could create a new PowerBI file and append the tables from multiple countries? Would this be correct?

Thanks for your suggestions

Hi @Ducati

Welcome to the forum.
You can use excel data as you feel comfortable in, but I would suggest if it’s not too huge to mange in excel then append all the data in single excel if possible so that you have single source. If your excel size gets big then probably you can use multiple excels and then append the data in the PBI tool itself.

Ideally if you are using multiple excel as your question states then you just have to make sure that your data is same in all the excel (I mean similar columns) to help you append that in the tool itself.

I would suggest you to take out 3,4 hours and go through the below link and you’ll be able to setup your model.

If you still face any issues I would be happy to help you out.

Thanks,
Ankit

Hi,

As a best practice , keep less columns and more rows. With good Data Modeling you can keep “Age, gender, location, disability, Alone (living alone) etc” as slicers and you can also improve the performance of your file.

Invest some time at the front in building a proper star schema with fact and dimension table, build SMART relationships, I promise you wont regret in the future.

@Ducati,

@Mohammed_ali hit the nail on the head in terms of keeping your structure “narrow and long” (relatively few columns, lots of rows) as opposed to the way Excel is typically structured as “wide and short” (lots of columns, relatively few rows). He’s also spot on in terms of building a proper star schema. I work with a lot of survey data myself, and here’s a basic sketch of a starting structure that may work really well for you:

This structure will let you slice on any attribute in your data related to respondents, as well as on date.

If you look down the road at the calculations you’ll likely want to do, this sort of “narrow and long” structure makes them really simple. For example,

CALCULATE(
     AVERAGE( 'Responses" [Response] ),
     'Responses'[Question Number] = 10
)

Will give you the average response to question 10. You can add more complex filter conditions to generate more precise results by gender, age, etc, and also leverage the evaluation context of your visuals to calculate across many questions at once. The key is developing a strong data model from the outset that will support the calculations and analyses you want to do.

In addition to the course that @kkrj.ankit recommended, I think the following course will be extremely important for you to work through in building that proper data model and structuring your data correctly within it:

Finally, you may want to take a look at this thread which provides links to a number of other discussions on the forum about the advantages of a “narrow and long” structure.

I hope this is helpful. Good luck with the very valuable work that you’re doing.

  • Brian

Thanks Ankit I am busy with the course you recommended. But as I am working through the course I am thinking about how to apply it in my context and modelling which is quite different from business modelling - hence my question

In EXCEL one assessment is analysed in 54 different EXCEL tables but it is very cumbersome. I am streamlining the process to reduce the number of tables to about half that. However I think I need to think more about the structure of the data model as Brian has suggested.

I think what I have understood is that the EXCEL file is the raw base data and then I need to build a model as suggested by Brian that will support the analysis - I will watch the data modelling video and then design my model and post it for review and comments

I need to also watch the data modelling videos that Brian recommended

1 Like

Hi Brian
Thank you for your feedback and recommendations. The basic model ideas were really useful and you already have me thinking about how to structure my model. I will watch and read the provided links and then design my model and post it here for review and suggestions on improving the model further

One advantage I have is that once I have designed the model I can reuse it for each subsequent assessment
Will post my model early next week
Cheers
Andrew

Hi Mohammed
Thank you for your input and you are right I need to invest time in designing the model but I think I need to learn a bit more as a next critical step
Appreciate your interest and hopefully I can post a design later next week
Cheers
Andrew

1 Like

Hello @Ducati,

I would keep all the Excel files in a folder and connect with Power Query from Power BI to that folder and “read” all the data.
Then I would combine all the files with the same structure and do the Data Model. The main advantage of Power Query over normal Excel is that you can really transform data (and I’m thinking here of UNPIVOT) and build a loooong and narrow Data (Fact) Table and several Lookup (Dimensions) tables . Then try to stick to a Star Schema Model so that the DAX measure stay simple.

Should you have two sample files I’ll be glad to try something to help

Hi @Ducati, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box. Thanks!

Hello Cristian
Thank you for your reply. I am not quite sure I follow you completely. In Excel I have around 50 tables that are used for the assessment analysis but there is quite a lot of duplication in the tables eg age, gender etc.

I am not sure how to construct one long table? Or quite sure what a star scheme is but the example above by Brian I can do and am working on that now.

I can create the dimension tables for example - age, gender, disability, living alone - these would be in (a) the respondent’s profile table, I could also create (b) location table with location, type of displacement eg refugee, IDP, Host, returnee etc and © assistive aid (product) table.

We also check for access to services eg health, water, shelter, food etc - would this be another lookup table?

We also ask older people about their priorities for assistance. We have 11 things eg water, shelter, food, health, cash, safety etc. For each they are asked to rank on a scale of 0 to 5 then we can calculate the priority ranking for older men, older women, older men with disability, older women with disability, older men living alone and older women living alone. So I can see how age, gender, disability and “alone” would be “slicers” and this latter info would be in the “respondents table” but can or how do I put “priorities” into a lookup table?

I have a sample EXCEL table that I can post (just got to figure out how to post a file - there is help so I will read it then post a sample table

Thanks for all the help

ALONE FOOD with DISABILITY access meals hungry nights NEW V2 Colombia 2020.xlsx (26.7 KB)

I hope I have done this correctly but this is an example table where we calculate the following for older people affected by a disaster:
How many meals per day they are eating
How often they go hungry at night
Barriers that they face in accessing food

This particular table looks at older people who live alone compared to older people who live alone with disability (so Alone and disability are filters and we also filter for age and gender) - I can see how we can use PowerBI to apply age, gender, alone and disability as slicers in a single chart

There is another almost identical table that looks at the whole sample of older people (without the Alone filter applied) and compares older people to older people with disability

So ideally I am looking to compile one chart in Power BI that looks at this topic - food but where I can filter age, gender, living alone and disability - this helps me understand how having a disability or living alone affects access to food.

I can also apply age and gender filters to see if there is any discrimination or disparity with respect to age or gender - this helps us to target our assistance better so we get the right relief to the right people at the right time.

Hope this makes sense?

Andrew

1 Like

Hi Andrew,

Checking the file you uploaded it seems this one is already having some groupings / counting / calculations.
When I said about having the Excel files in the same folder and get their data with Power Query I was thinking of the RAW DATA Files from the surveys (XLSX or CSVs).
I wanted to showcase a small report and found a sample Raw Data Survey file on data.world (https://data.world/browning/capmetro-smart-trips-questionaire).
Even if I got only one file and the process is more or less the same: from 76 initial columns I got only 6 in the Data Model through UnPivot. Getting the dimensions created from Raw Data it got me to a “Star Shema Model” .

In the file I got I made no Explicit Measure but that is the way to go should you wand a trully dynamic report. Check out the GIF below and let me know should this is what you were looking for.

Survey Results Sample.7z (3.8 MB) .

Regards,

Cristian

Dear Cristian

Thank you for your reply and apologies I misunderstood the file you required was for the raw data. I will, upload the raw data file on this reply.

I could not open the file that you posted non of my applications can open the file? Is it possible to have the file in PDF?

I have been working on a data model and have about eight lookup tables and then I would have twelve data tables one for each sector such as Health, Water, Food etc

I have been continuing to study the course material and I have a better understanding of Key measures, measure branching etc but still struggling a bit with the data model.

Below is the raw data for our assessment in Colombia of older Venezuelan men and women who have migrated to Colombia

Guajira RNA - cleaned data .xlsx (428.7 KB)

Thank you for your assistance I really appreciate your time and effort

Andrew

Hello Andrew,

The file I uploaded is actually a compressed GIF file than needs to be un-compressed before opening it.
Unfortunately there’s a 4MB limit for the upload and even if I changed it to ZIP it’s still 4.1 MB ;).

I also uploaded the file here also for you to take a look and let me know should this kind of report is what you need .

I’ll have a look on the Raw Data File and today and I’ll show a POC as soon as I have one.

Cristian

Great Thanks Cristian

I have worked on the data model but I am not sure that it is correct I will have a look at the file you have uploaded
RNA Star data Model 2020.docx (134.7 KB)

Andrew

Hi Cristian

I unfortunately can not get into PowerBI at the moment as my laptop has developed a RAM memory problem and I am using my Mac which is not compatible with PowerBI so I can’t read the files. We are still in lockdown so work is unsure how soon they can get my Windows laptop working again.

I am trying to keep learning and working out the data model so if you could help me work out the model for now I would be most grateful
Andrew

Hi Andrew,

There’s no need for PBI to view the GIF file.
Check out, meanwhile, the first sample report with your data here
Let me know your thoughts.

Cristian

The model you’ve started looks good so keep it up.
Using the file you attached, besides the sample I uploaded, you could do analysis with a Decomposition Tree like in the picture below:

I’ll try to update the PBI Report saved on the service already so you could “play” live.

Cristian

1 Like

Hi Cristian

Thank you for your support the Decomposition Tree looks awesome and that is certainly an excellent solution to some of the analysis

Is there a video on how to construct a decomposition tree? If yes please can you send me the link

I have a question about the model

The fact table in the model it is still very wide at least 200 columns. I get Sam’s point about removing unnecessary columns and I can see why a narrow table is very advantageous- here is my challenge - all the data in the fact table will be used so I can’t delete them

I can use the query editor develop all the other tables in the model that is easy enough but I am a bit stuck on what to do with the fact table in the middle?

Here is my question
As I see it I have two options:
(a) use the query editor bring is the fact columns from the Excel file and commit it to the model as one very wide table (roughly 200 columns)
Or
(b) using the query editor create eight fact tables (one for each sector eg Food table, Health table etc) - in effect this would break up the wide table but I am not convinced this is the ideal

If I use option (a) and if I organise my queries well (I basically already have a good idea of the queries I need because I am currently using Access and Excel) and file all the queries by sector the I will have basically broken up the wide table and achieved the same as option (b)

Please let me know your thoughts on this

I am working my way through the “Advanced data modelling and transformations course so am interested to learn about staging queries and I am certain I can use key measure branching

Thanks for all your help and support
Andrew