Building a model using small dataset

I want to build a Power BI report and need to connect to two databases. The connections are very slow and it’s difficult to play with the data. I would like to get a sample of the data, build the report and then connect to live databases. What’s the best way to accomplish this? Is there a way to import a sanpshot of data from SQL Server database? I am not in IT, so need help how do I build the report first using small set of data and then scale. Thanks in advance.

Hello @kjani,

Welcome to the Forum and thank you for posting your query. :slightly_smiling_face:

As you said that connections are very slow when you connect to the live database.

It is important to note that for DirectQuery and live connection datasets, the cache is updated on a periodic basis by querying the data source. By default, it happens every hour, though you can configure a different frequency in the dataset settings. Each cache update will send queries to the underlying data source to update the cache and this will result in slow connection in order to retrieve the queries each time you refresh and therefore ultimately affecting your Power BI visual performance.

It is also important to understand that how the models are designed after you connect with your database because the model design itself, including its data preparation queries, relationships, and calculations, all add to the mix of considerations when assessing the performance of the reports.

Therefore, to successfully deploy and manage Power BI, one need to understand where models are hosted, their storage mode, any dependencies on gateways, size of imported data, and refresh type and frequency. These configurations can all have a significant impact on Power BI capacity resources.

I’m also providing a link below which will help you understand how you can connect to a live database, what are advantages and limitations associated with it and what is the best way to connect. You can reduce the size of your queries in order to the improve the report performance. I’m also providing a link of a video it.

Hoping you find this useful and helps you to improve the report performance.

Thanks & Warm Regards,
Harsh

1 Like

Hi @kjani,

So as per my understanding, you want some sample data from the SQL server and connect that to Power BI to create some sample reports. So there are many ways to do that and I am suggesting two out of them that you can use.

  1. You must be having the access to the SQL DB and there you can write a new query in the query editor like
    Select * From [Table] where [Date] Between ‘2020-06-01’ AND ‘2020-06-30’ to get the data preview of last month.
    Then you can copy the data from there and paste it an excel or CSV file. Finally, you can load that into Power BI to get your report.

  2. You can directly open Power BI and choose the SQL Server option from the Get Data. In the Advanced section, you can copy-paste the above query(replace your table name and date field) and connect directly to the SQL server to load the sample data in Power BI. After you are done testing you just need to remove that where clause and you will get the full data in Power BI.

I would prefer #2 as this will be more useful when you will be loading the full data in further proceedings.

Thanks
Shubham

1 Like

Just found out that the database in Amazon Redshift, not SQL Server. Is there a way to limit input data from Redshift? Thanks

Hello @kjani,

I’m providing a link below of an article related to Amazon Redshift which will guide you about how you can extract the sample size of the dataset.

Hoping you find this useful.

Thanks & Warm Regards,
Harsh

INTEGRATE POWER BI WITH AMAZON REDSHIFT FOR INSIGHTS AND ANALYTICS

Hello @kjani,

I think that should follow the link that @Harsh provided. It has a section for the generation of some sample data. Please refer below which is there in the provided link as well with all the set up.

Generating an sample dataset

Now that you have provisioned the initial infrastructure, let us generate the sample data to be used in the Power BI visualizations. For more information about how to generate the sample dataset, see GenerateSampleDataset.txt in your working folder.

Replace the placeholders in this file with the following corresponding values from the AWS CloudFormation Stack Outputs section:

  • [KeyPairFileName] – Replace with the value of the EC2KeyPair template input parameter
  • [EC2IPAddress] – Replace with the public IP address of this EC2 instance with the key DataGeneratorPublicIP
  • [S3BucketName] – Replace with the actual S3 bucket name

After you generate the dataset and copy it to your S3 bucket, terminate the EC2 instance with the name Data Generator . You don’t need it anymore.

Hi @kjani, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box. Also, we’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!