The Customers table is the whole database. I would like to reduce it so that it only contains the Customer IDs that are relevant to the Sales table - that is, if the Customer ID is not in the Sales table then we don’t need that Customer in the Customer table.
Not sure how to do this as it would have to be dynamic. That is, every time the data is refreshed there could be new sales from customers not in our current truncated Customer table that would need to be added.
Is this something that could be sorted in the Query Editor or would it need to be done in Power BI?
Before you start to design a model it’s important to keep a few things in mind:
The business process
The grain of the model
The Dimension tables
The Fact tables
I like to “only bring what I need” but mostly that’s reducing the number of columns not so much the number of rows. Generally this reduces the file size and improves performance, so Power Query is the way to go in my opinion.
To illustrate I made a small test data set with just 2 tables (customers and sales).
Finally add a filter, so you’re left with only the TRUE’s and you’re done. You can remove the KeepRows columns after the filter or just hide it in your model. Here are my samples.
If you using power query to connect to a customer SQL database you are querying the entire Customer list for the first time. You can filter down the rows but you need to fetch the list of customer ids from the sales table.
You can make the choice of customers dynamic by right clicking the CustomerId “coming from the Sales table” and selecting “Add new Query”. This will give you a query list where only the customerid sales exist and each time you refresh the data the list will be updated. Again right click and remove duplicates so you have a unique list.
You can now use this list to filter the CustomerId on the Customer table but it needs to be turned into a parameter first. Create a new parameter as shown below:
Can you see if adding Table.Buffer improves the query speed? Example below
let
Source = Excel.Workbook(File.Contents(FileLocation), null, true),
tCustomers_Table = Source{[Item="tCustomers",Kind="Table"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(tCustomers_Table,{{"Customers", type text}, {"Name", type text}, {"Adres", type text}}),
BufferSales = Table.Buffer(Table.Distinct(tSales[[Customers]])),
#"Added Custom" = Table.AddColumn(#"Changed Type", "KeepRow", each List.Contains(BufferSales[Customers], [Customers]), type logical ),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([KeepRow] = true))
in
#"Filtered Rows"
The Table.Buffer(Table.Distinct(tSales[[Customers]])) will load the distinct Customers in memory - you need the double square brackets to turn it into a Table because a single Column is of type List.
As you can see, you do not reference the previous step in the BufferSales line but in the next line you reference the step before BufferSales.
@KieftyKids this should work with CSV files. Importing CSV files should be fairly fast. How many customer records do you have?
Once the data is power query you can create a dynamic customer sales list and filter the customer table.
You don’t need extra steps to merge the two data tables and then to create a third table with the results. It would obviously increase the size of your model. How big is your pbix file at the moment?