Datasets, Data Warehousing, etc

Do you all have any good resources (other than Googling) for getting better at creating the actual datasets, or data warehousing? I haven’t gone through all of your training videos Sam just yet :smile:.

We have access to Linda.com at work, but I can’t really find some good walkthroughs with examples of the proper setup of the datasets. I’m YouTubing my heart out as well and cramming as much info as I can.

As we are diving into Power BI at my work, I have noticed that lot’s of data tables I am given to run reports off of, seem to not be set up properly, which makes the Power BI portion harder.

I am a visual person, so being able to see someone create a dataset from scratch explaining some reasoning behind the setup would go a long way to give me the Aha! moment. I am aware that there are many aspects of the dataset design depending on the needs, so I am looking for some general overview.

I have been trying to download some freebie datasets to see the structures, but most are a single table with no Dim tables/relationships shown.

Any tips and links would be much appreciated.

Hey Paul,

Its a good question you have. The main issue is there are so many data sources available and the way the data is structured and the purpose of what you need impacts what data sets you need to build.

My advice would be to go a training course where you can sit in front of computer with people in a similar scenario. You can only learn so much from Videos and Data modelling theory is important enough to invest the money to learn. I am going to attend Marco Russo & Alberto Ferrari 3 day workshop in August this year.

In saying that, Power BI is an analysis tool ONLY and holding vasts amounts of data which is not required is not recommended. Power BI likes tall data tables …and not wide ones if possible…its a columnar data base so it can really calculate super fast…

The main tip is to workout your FACT and DIMENSION tables and try and set them with only required columns for visuals, every other column can be excluded from your query. Try and make a star schema in most cases is best approach with your Fact and Dimension tables.

Sam has some good content on this part of the approach.

Design also takes into account what kind of data you are accessing and how you need to ensure it reconciles to the master data set. (Source)

For example, reporting on accounting data like in a profit and loss or balance sheet requires the General Journal Fact Tables from the source data. You need to ensure that your figures balance to the source and have some kind of system to agree Power BI to the source accounting system holding the fact data.

You may also want to get Dimension Tables for things like Accounts & Jobs and those Dimension tables have supporting tables too which you might need to break up and group items.

It takes a long time to develop a technique that works well and fits a purpose. I have built this experience up over time and doing workshops and readings. I am an accountant and worked a lot with software so Power BI is a natural extension of what I already knew.

A lot of people can show off nice reports and dashboards but the theory of how the data is obtained and reconciled is paramount for a proper solution you can manage or walk away from so your client can control.

Its a worthwhile topic , “How to setup your data set so you can reconcile what I call control records”.

I am happy to develop some future content, if Sam see’s the need.

Cheers and Good Luck!

1 Like

Hi Paul firstly, you know there are over 30 downloable datasets in this course module right? - http://portal.enterprisedna.co/p/power-bi-resources

There’s a whole variety in here

1 Like

Hopefully you have had time to go through this course as well. This is where I go in depth into data modeling in combination with the query editor, as working with the two in combination is absolutely key in Power BI

1 Like

Data Modeling is a bit of an art, as every data set, data scenario & required output is different. What I have done to assist here is provide many things to give users as broad an experience as possible across many things.

What the demo data sets there are over 35 different scenarios you can work with. Most of these also have a corresponding completed model. The idea with these it to really dive into how I built them out, structure the model etc.

There are now close to about 40 completed models to download and review in this one module - with more to come this year and in the future.

It’s interesting though 80-90% of the time (and from experience working across well over 100 models) the same simple principles apply and that’s what I’ve looked to capture in the Advanced Data Transformation and Modeling course.

Hopefully this give you some further direction. Let me know if any further thoughts.

1 Like

Thank you, both for the replies and suggestions. I will be checking out all of the sources you supplied.

Thanks Sam. Great resource to use for reference.

Companies use an EDW database to store disparate data from different sources in one place. This is indeed a difficult task. Therefore, if you need help, you should contact a reliable technical specialist in the field of enterprise data warehouse architecture https://www.cleveroad.com/blog/enterprise-data-warehouse/

The original post is 5 years old, I reckon they will be ok now :wink:

1 Like