Irregular Extract Files & Time Intelligence

Hi Forum,

Newbie user here (be gentle please). I work in revenue strategy and am currently putting together BI booking pace reports (room nights and revenue) using a data dump folder containing large .txt files.

Ideally each daily extract file is automatically exported into the folder, but until recently this process has been done manually. And manual entry means inevitable mistakes. I currently have 413 files (approx. 2.5GB) spanning 5 years (2015 - ).

In excel I would normally load each .txt file and compare changes (TY vs. LY vs. Prior) but since I am determined to get the most out of BI’s time intelligence features and also aim to automate/reduce daily admin tasks, I want to know how best to approach the challenge of irregular extracts. In some cases I only have 3 extract files covering and entire 30-day period. Not great.

Can this be solved with DAX functions or using M in the Query Editor perhaps?

Unfortunately I cannot share my files but have attached a screen dump showing the first few files before applying the step of combining them into one table. All other consolidation and scrubbing of data has been a breeze so far, but I cannot seem to find a comprehensive solution to this.

Really appreciate your help.

Best,
Oscar

Hi @oscar_wandel,

Okay I’m not at all sure what you’re after at the moment so please correct me if I’m wrong OR if am heading in a totally different direction… and provide more specific details :wink:

  1. You’re importing files from a folder and don’t know if new files have been made available.
  • Add a RefreshDate Query to your model, here’s the M code

      let
        Source = DateTime.FixedLocalNow(),
        #"Converted to Table" = #table(1, {{Source}}),
        #"Renamed Columns" = Table.RenameColumns(#"Converted to Table", {{"Column1", "Last Refresh"}}),
        #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"Last Refresh", type datetime}}),
        #"Inserted Date" = Table.AddColumn(#"Changed Type", "Date", each DateTime.Date([Last Refresh]), type date),
        #"Insert Time" = Table.AddColumn(#"Inserted Date", "Time", each DateTime.Time([Last Refresh]), type time)
      in
        #"Insert Time"
    
  • and this is what it will look like
    image

  • Now you can test if new files have been added since let’s say yesterday (that’s what the #duration is for) AND count the number of days since new files were added to the folder:

       let
      Source = Date.From(List.Max(Files[Extract Date])) > (List.Max(RefreshDate[Date]) - #duration(1,0,0,0)),
      #"Converted to Table" = #table(1, {{Source}}),
      #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "New Files"}}),
      #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"New Files", type logical}}),
      #"Added Custom" = Table.AddColumn(#"Changed Type", "Days since new files were added", each Duration.Days( Date.From(List.Max(Files[Extract Date])) - Date.From(List.Max(RefreshDate[Date])))*-1, Int64.Type)
      in
      #"Added Custom"
    
  • So I called this Query NewFiles and because I had no recent files it returned FALSE which I can now show in my Report and also how many days have passed…
    !image

  1. Time Intelligence calcs are best done with DAX, make sure you add a Date table to your model first. There’s a lot of content within Enterprise DNA on Date tables and Time Intelligence functions.
    So if you run into a specific problem just search the Forum and/or create a new thread.

I hope this is helpful.

2 Likes

Hi Melissa,

Wow! Thanks so much for this comprehensive response! Very helpful indeed.

Apologies, I should have made it more clear. Knowing if there is a new file in the folder is not the problem, but I appreciate the m code breakdown. Also very helpful.

Since I’m still trying to make room for BI logic in my brain, here’s the process done (manually) in Excel:

  • Current extract file is imported and compared with file from the day before (or closest available).

  • I then import an extract file from the same date (or closest available) from last year and compare.

  • Lastly I have a third option setup where I import an arbitrary file if I need it.

Each extract file contains a “start date” (i.e. arrival date) column with the earliest start date always being January 1st the year prior and going as far into the future as there is business on the books. So every extract file done in 2020 will have a start date column beginning January 1st 2019 with no exceptions.

I guess what I’m asking is:

  • Am I able to setup my BI report so I can pull up any extract file I choose and compare it to the most recent? Lets just call it “Today vs. Any Prior”.

  • With respect to DAX Time Intelligence, let’s say there’s no 19th of February 2019 extract to compare with 2020. How does BI evaluate the DATEADD -1 YEAR for example?

Sorry for the long-winded response. It is a challenge when I am not permitted to share file contents.

Okay so interacting with the data model itself is probably not that intuitive…but let’s see how far we can take it. First the source FolderContents query

  • Add a Date column Date.From([Extract Date]), type date
  • Create a ListFileDates query from a blank query with this M code
    List.Sort( List.Distinct(FolderContents[Date]), Order.Descending )
  • Create a new Parameter SelectDate via “Manage Parameters” on the ribon
    Type = Date, Suggested Values = Query, Query = ListFileDates
  • Create a LastDate query from a blank query with this M code
    List.FirstN(ListFileDates, 1)
  • Create a new Parameter SelectLastDate via “Manage Parameters” on the ribon
    Type = Date, Suggested Values = Query, Query = LastDate

.
The Today query equals the most recent file, that’s no problem.

  1. Duplicate the FolderContents query
  2. Sort the Extract Date column in Decending order
  3. Only Keep the top row
  4. Extract the Binairy content
  5. Perform all other transformations

.
The second query aka Last Year equals the most recent file from one year ago, that’s no problem.

  1. Duplicate the FolderContents query
  2. Sort the Extract Date column in Decending order
  3. Place a Filter on [Date] <= Date.AddYears(SelectLastDate, -1)
  4. Only Keep the top row
  5. Extract the Binairy content
  6. Perform all other transformations

.
The third query aka User Selection would basically follow the same pattern as the Last Year query with a few exeptions:
The Parameter value has to be selected. And here’s the catch Parameters are hard to interact with for ‘users’. If they have acces to the PBIX file and view it from within Power BI Desktop they can change the Parameter by going to the Home tab, Edit Queries and select Edit Parameters.

However if they only have acces to the report from within the Power BI Service, the only way to change a Parameter would be by going to that reports Workspace, Dataset (select the elipsis …), choose Schedule Refresh and there you’ll find Parameters…

I believe your Parameter must be either type Text or Decimal to be able to be changed so a duplicate SelectDate Parameter has to be created, let’s call it SelectDateTypeText and when you call that Parameter just wrap it in Date.From(SelectDateTypeText) to turn it back into a Type date …

Is this what you had in mind?

3 Likes

Amazing stuff! I need to go over your solution a few times but you’ve already been a massive help.

Thank you so much, Melissa.

Glad I could help.
If you need any further assistance just give a shout…

1 Like

Hi Melissa,

Afraid I’m not grasping this, but hoping I just need to have it dumb’d down a bit more. Blank queries and parameters is completely new to me, so thanks for bearing with with me. Here’s what I don’t understand:

  • Why am I adding a “Date” column from to the “Extract Date” column? My initial intention was to define an “Extract Date” column prior to combining the files into a single table and using that.

  • I’m confused when you write “duplicate the FolderContents query”. There is no query I am creating called “FolderContents” is there?

  • If “LastDate” query simply lists the most recent date in the “Date” column I have created, isn’t that essentially the “Today” query?

For the following steps, I think I get it. At least it’s straight forward creating the various queries and parameters:

Thanks again for your help. I really appreciate it.

Hi Oscar,

Okay so in creating a mock-up, it actually took fewer steps… let me walk you through it.

As I understood it the goal was to get the most recent file from a folder. So I created a query using the Get data from Folder option and renamed this FolderContents.
image

Now it contains a datetime colum called Date modified, I’m going to leave that as it is because time can be important in the sort order. I’ll add a separate Date column with the M code depicted in the formula bar. In M type datetime and type date are not equal.

The FolderContents is our base query and I’m not going to Duplicate but Reference it by right clicking on the query name and selecting Reference. Rename it Today, sort the Date Modified column descending.

… and only keep the top1 row.

For the next query we need to do some ground work first. Create a new, from Blank Query
image

Enter this M code and rename it ListFileDates

Repeat the new from Blank Query step, enter this M code and rename it LastDate
the {0} at the end is important because it allows you to access the first Value in the list.
image

As an alternative to the LastDate M code above you could also reference today’s date, like so:
Date.From(DateTime.FixedLocalNow())

So the LastYear query… start by making another Reference to the FolderContents, sort the Date Modified descending and place a filter on the Date column (select equals and enter a date than adjust it according to the M code depicted below - again keep the top1 row.

Now I only created one Parameter for the last query.
image

And here’s how the field values are defined for this Parameter called SelectDateTypeText

So we can now create our last Reference to the FolderContents query
again sort the Date modified descending and place a filter on the Date column, adjust the M code like so

Think this is it - here’s my mock-up file. I hope this is helpful.
eDNA - Irregular Extract Files & Time Intelligence.pbix (146.3 KB)

3 Likes

Hi Melissa,

Sorry for the delayed response. Thank you so much for all your help, this worked great but as you wrote in a prior post it is not so intuitive and will likely look into a different setup might be done, but when my BI skills are up to the task.

Best,
Oscar

Hi @Melissa,

It has been some time since starting this thread and unfortunately I have not been able to get back to developing my BI skills as desired due to the ongoing COVID-19 crisis and how it is affecting the industry I work in. But your assistance was invaluable to get me started so thank you again for that.

I am now at the stage where a report is (slowly) coming together and “Current” vs. “Last Year” comparisons are functioning as desired. If you recall, I am working with three main tables for two types of time comparison:

  • “Current” data extract (e.g. today’s extract) vs. “Last Year” data extract (e.g. same date last year)
  • “Current” data extract vs. “Prior” extract (e.g. user defined date within the prior 12-month period)

Where I am stuck is integrating the User Selection parameter outlined above as part of the dashboard. If possible, I would like the user to be able to select (in a slicer visual?) the desired “Prior” data extract from a drop down menu. Can this be done and if so, do you know how?

Best,
Oscar

Hi @oscar_wandel,

Welcome back!

As I mentioned in post #4 interaction with Parameters is not intuitive or user friendly…

.

So I totally understand the need to interact with the user from the report page itself and I think there is a way to accomplish that. However the downside will be a significant increase of data in your model.
Won’t keep up the suspense any longer :wink: the general idea would be to leverage of this technique described in the dynamic date range slicer (see topic below).

Meaning that instead of only loading the data into the datamodel for one scenario (like you are when using a PQ Parameter), you will have to bring in the data for both scenarios and create a key between the Slicer table and the Fact table and in the Slicer settings (in the report) you’ll have to enable “single select” for this to work.

I hope this is helpful.

1 Like