Web Scrape US Federal Holidays

Hi all,

Using Power BI Desktop
I am following along on this tutorial from BrianJ to scrape US Federal Holidays from Calendarpedia.

However, when I try to expand the Tables, there is an error with 2023, 2021 and 2022 work just fine
The error:
An error occurred in the "query.Expression.Error: The column ‘Date’ of the table wasn’t found.

I looked at the table for 2023, it appears to be the same as the tables for 2021 and 2022.

Any ideas
webscrape_dates.pbix (16.2 KB)

Hi @wynn,

Here’s what I did, I created 2 separate queries one for 2022 and one for 2023 using the From Web connector. Looking at the Html.Table code you will notice a difference:

Html.Table(Source, {{“Column1”, "DIV:nth-child(8)
Html.Table(Source, {{“Column1”, "DIV:nth-child(10)

Just include an if-statement to deal with that:

( Year as number ) as table => 
    let
        Source = Web.BrowserContents("https://www.calendarpedia.com/holidays/federal-holidays-"&Number.ToText(Year)&".html"),
        FromHtml = [
            a = Html.Table(Source, {{"Column1", "DIV:nth-child(8) > TABLE:nth-child(7) > * > TR > :nth-child(1)"}, {"Column2", "DIV:nth-child(8) > TABLE:nth-child(7) > * > TR > :nth-child(2)"}, {"Column3", "DIV:nth-child(8) > TABLE:nth-child(7) > * > TR > :nth-child(3)"}}, [RowSelector="DIV:nth-child(8) > TABLE:nth-child(7) > * > TR"]),
            b = Html.Table(Source, {{"Column1", "DIV:nth-child(10) > TABLE:nth-child(7) > * > TR > :nth-child(1)"}, {"Column2", "DIV:nth-child(10) > TABLE:nth-child(7) > * > TR > :nth-child(2)"}, {"Column3", "DIV:nth-child(10) > TABLE:nth-child(7) > * > TR > :nth-child(3)"}}, [RowSelector="DIV:nth-child(10) > TABLE:nth-child(7) > * > TR"]),
            c = if Table.RowCount(a) =0 then b else a
            ][c],
        ChType = Table.TransformColumnTypes(FromHtml,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
        DelTopRow = Table.Skip(ChType,1),
        DelBottomRow = Table.RemoveLastN(DelTopRow,1),
        PromoteHeaders = Table.PromoteHeaders(DelBottomRow, [PromoteAllScalars=true]),
        ChType2 = Table.TransformColumnTypes(PromoteHeaders,{{"Date", type date}, {"Federal holiday", type text}, {"Day of the week", type text}}),
        SelRows = Table.SelectRows(ChType2, each ([Day of the week] <> "Saturday" and [Day of the week] <> "Sunday")),
        ReplaceValue = Table.ReplaceValue(SelRows,"(observed)","",Replacer.ReplaceText,{"Federal holiday"}),
        ReplaceValue2 = Table.ReplaceValue(ReplaceValue,"*","",Replacer.ReplaceText,{"Federal holiday"}),
        SelCols = Table.SelectColumns(ReplaceValue2,{"Date", "Federal holiday"})
    in
        SelCols

I hope this is helpful

2 Likes

Thank you @Melissa, for updating the powerquery-m to accommodate a change in the webpage layout. The Time Intelligence Series is awesome and one that I wish more folks would take the time to watch. You and @BrianJ put in a great deal of effort that shows!

I thought it helpful to also explain in a small way that prior to Web.BrowserContents becoming available, the Power Query Editor would default to Web.Page() function.

I still maintain code with either function, as they are both supported. If one has a need, the default setting in Power BI Desktop for the Power Query Editor and web table inference can be toggled off.

More information and how to manage PowerBI Desktop - web table inference

PowerBI Desktop - web table inference:

By default, PowerBI Desktop enables a new web table inference experience when importing data from web pages. The new experience uses Web.BrowserContents to retrieve the page content, and can detect repeating patterns on a page just beyond literal HTML tables. Changing this setting will take effect the next time you open the Power Query Editor window.

On the other hand Excel Power Query Editor (as of this date) defaults to the older Web.Page function. The powerquery-m query pattern developed in Excel Power Query can be reused in Power Query Desktop.

How to change the Power BI Desktop web table inference setting:

In Power BI Desktop, you can use the older Web.Page function by clearing the Enable web table inference option:

  1. Under the File tab, select Options and settings > Options.
  2. In the Global section, select Power Query Editor.
  3. Clear the Enable web table inference option, and then select OK.
  4. Restart Power BI Desktop.

pbi web table inference

Alternatively:
In cases where you need to use Web.Page instead of Web.BrowserContents , you can still manually use Web.Page .

Basic query pattern:

let
  Source = Web.Page(Web.Contents("<your address here>")),
  Navigation = Source{0}[Data]
in
  Navigation

Supporting URLS:

• Microsoft Learn: Troubleshooting the Power Query Web connector
• Microsoft Learn: Using Web.Page instead of Web.BrowserContents
• Microsoft Learn: Web.Contents
• Microsoft Learn: Web.BrowserContents
• Miguel Escobar Jul 24, 2018: New Web Scraping experience in Power BI / Power Query (Using CSS Selectors)
• Matt Allington January 8, 2020: Extracting Data from Complex Web Pages with Power BI

2 Likes