Pulling multiple tables from a single web page

I am polling a Canadian web site that has public demographic data for all of the Schools in the province of Ontario. You reach the page I want, by including the unique school identifier as a parameter value in the URL.
https://www.app.edu.gov.on.ca/eng/sift/schoolProfileSec.asp?SCH_NUMBER=916293. I have a list of the schools with this identifying number and I want to create a function that will scroll through this list and pull the HTML table from the page for each school.
This is the code for the function

(Mident as text) =>
let
Source = Web.BrowserContents(“https://www.app.edu.gov.on.ca/eng/sift/schoolProfileSec.asp?SCH_NUMBER=” & Mident),
#“Extracted Table From Html” = Html.Table(Source, {{“Column1”, “DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=”“2"”]:not([colspan]):nth-child(1):nth-last-child(3), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4)"}, {“Column2”, “DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=”“2"”]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan="“2"”]:not([colspan]):nth-child(2):nth-last-child(2), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3)"}, {“Column3”, “DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=”“2"”]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan="“2"”]:not([colspan]):nth-child(2):nth-last-child(2) + TH[colspan="“2"”]:not([rowspan]):nth-child(3):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(2)"}, {“Column4”, “DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=”“2"”]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan="“2"”]:not([colspan]):nth-child(2):nth-last-child(2) + TH[colspan="“2"”]:not([rowspan]):nth-child(3):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(1)"}}, [RowSelector=“DIV.row:nth-child(13) > TABLE > * > TR”]),
#“Changed Type” = Table.TransformColumnTypes(#“Extracted Table From Html”,{{“Column1”, type text}, {“Column2”, type text}, {“Column3”, type text}, {“Column4”, type text}}),
#“Renamed Columns” = Table.RenameColumns(#“Changed Type”,{{“Column2”, “Column2 1”}})
in
#“Renamed Columns”

I can connect to the web page and pull the data for individual schools but as soon as I create the custom function and add it to a table that has 1 column (Mident) with all of the school identifiers, and save my changes the module the system hangs with a message “Creating connection in model”. This lasts long enough that the system times out.
The process seems pretty straight forward and I can not figure out how to get rid of the error around creating the connection in the model.
Has anyone seen this issue and found a resolution for it ??

List of Schools.xlsx (11.2 KB)

Hi @Brad.V,

Can you give this a go. Just paste it in a blank query…

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lVbbcts2EP0VjJ/amXhKgvdHW5ajJHKiWp54Omke1iQsogIBBQTtqn/Q3+hbvsM/1gUBULKbZKZ+8Gi4C2Av55zdT59OznqjhDJk3gw1GK4kCDJj0mh28uokx784Ovn8Cv3uNIAm67pVSqCpLMqqLEbTGrqdYN5ETslK8wcw9nw82m9aht9Yb/jm4HWjtJJGuUeSKh09ZyAeQO/JrNUc3UGSsxoa1u3dgzT3sfTt3WDdlBBsYx8qy5xWLpq56B6Vao4iLfMkcgcXTHMDGIW/lqh7smSgJZcbMv+zZnifrMcLizzzh96BBAPPIily/9gMbQ0Q5rP65YMx8AhkBlowoySZn05hYJo0dgXxXm96AR2vjyNN4zQ/dsEqQUveSCyGGYzLlNKscmXAiJ6+dpyRhhFhX91hdsKFXxZJ/D+8rlnDWMf0UekXfNMex1ZFeeH7OcgGXc/huFPH9aZRMnq+V9q0TEuyxLtMT9ZrtFdxmVWlK5+qWw2SkcVoiCitqGui3AjWgjbBUlY09S0E3YfPMXbdvfRGqy+D4j25BCEOD9Gy9D3cDf0WetvnC38W83Fn33G9FSAbsoQtIxeYj+a1QXR1O9Vzww6pVZiYP7Q6m83JT2fNIKynNFwOI4gCjX627klCPUeu1fjCFa9bJrF2LsA0ySLXyxvecQywGwMMEXinvEqj4IQOPVmQ34coojn56BzKrHBpnomN6uAQhI1Msnr8uWa1ko1l1yGdiCapZ56lPFLf3zC7sYhNyyorvmUeAyurOEm9uQVUABhCt6IyrzwZBUdxGQu79p2MvKIslJZsB/up/XGRFJ5yI+xnz9OkUZa42l+B3pDXXHRq0Af+HlQrrfLMwdrWm+94rQyT/hUUkTI/oJOsW6VREBolWF/jFUfVO0C6iuPMhbYedigiSpMFc5ieueiyMvIetxj4HBM2oUwZ9WndthZNKy5Z/zK5LInT8j9OV1wq/Me3ADbFMbesKlz456D1vlGPWL+DEFZpnKXUN4WJrr9XuiEfkRRs/2NoR3FMY89KeY9yoH0RQhKJ6/UFsv5GMzCdLdZKq42GjqyHZpTkCWVjrFGWu6Tm/Q4kUiAApEpiB5Al9DY2Dw4a5dQFv+QPTGDE/nuSZA6nS3U6tyJNVqC34VSS+o4uVb1l+m4/Jki8MY8DaiQ3ahDc55SnRVLG3zSh2ORF5RHUt1uF2tmSW4aZde5kXmUOQ2MXi/UjIHnVlrzHoj0gnIRB2XO/j+lWJe7YB3fM0hOhZHMZr6XYPScXwOWdenx+zxpnazz6pSkNaPRld0IQ0yoIQQcbFKQWMWqFp58EB2ehRwff7dgjBFrGSemeviRvyVVtibtRJggtzWJfj0nSgzjFXsFWFo5krXA2+ENJSgt35y0Of3LJNM4Kfy7DQefacsubht9zJppgSkIov6nBtA3OqiPAOZb3ZGkaB7HKo/acwYMdShpYwGxKQ7Laym/NgayGO3E8crEnhSfeBbJpUokoo6WfLBsJ9SRRZezR9OvA0Hku+F9wx1BG/FRJ8rKYJqofKN5S4MbkmmP4Vm3BP4WTrUqc4RJrSy5xINaYoH8wKgs6QWJPrrlNMtyYe3leQNfhWNz+oUQovpUBTxkMomXQPAfT6JOm8QtVc0rCcX0jdl59VNNSeLyEVFlUBrnC1m6s+0HQ3oweNPKj7zUqiWjMOI0CK2Oa5dkUHTmIqgNVhMvSMTEfodmEFiA1MxrGAJhWTcKeFqXL+D1K/ga/2xZcoySEFmChE3fv098IANIMBOHc+G0gy4IJRwHpnZBxnA07i5kvA25PjrUfmn3fP31lXsmScG5Nwj7jZ2uUT5ZAG8+ZuJgs0+1Ld/sZMg/cEmoXmOpwxyp4XjMJvO+DF8UlIzqOHVc80uNo9PGePTDJ9diULOjrS08g+ulrzwX3+y/O0CKPvlOQBUi7Kp6SeiSjXSxxO8BxYhk6HkYpS75zOBQR37Mne3KOa9vIg6icyvLy0BXUFxaFzSmu1oYz/epHb6fRoSPt0z/GbiZv2YATtWcefQWG9/lf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [SCHOOL_NAME_EN = _t, MIDENT = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SCHOOL_NAME_EN", type text}, {"MIDENT", type text}}),

    fxSchoolData = (myURL as text) as table =>
    let
        WebSource = Web.BrowserContents( myURL ),
        TableFromHtml = Html.Table(WebSource, {{"Column1", "DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=""2""]:not([colspan]):nth-child(1):nth-last-child(3), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4)"}, {"Column2", "DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=""2""]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan=""2""]:not([colspan]):nth-child(2):nth-last-child(2), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3)"}, {"Column3", "DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=""2""]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan=""2""]:not([colspan]):nth-child(2):nth-last-child(2) + TH[colspan=""2""]:not([rowspan]):nth-child(3):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(2)"}, {"Column4", "DIV.row:nth-child(13) > TABLE > * > TR > TD[rowspan=""2""]:not([colspan]):nth-child(1):nth-last-child(3) + TH[rowspan=""2""]:not([colspan]):nth-child(2):nth-last-child(2) + TH[colspan=""2""]:not([rowspan]):nth-child(3):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV.row:nth-child(13) > TABLE > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(1)"}}, [RowSelector="DIV.row:nth-child(13) > TABLE > * > TR"]),
        ChangeType = Table.TransformColumnTypes(TableFromHtml,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
        RemoveTopRow = Table.Skip(ChangeType,1),
        PromoteHeaders = Table.PromoteHeaders(RemoveTopRow, [PromoteAllScalars=true]),
        ChangedType = Table.TransformColumnTypes(PromoteHeaders,{{"Column1", type text}, {"Column2", type text}, {"English students", type text}, {"French students", type text}})
    in
        ChangedType,
    
    InvokedFunction = Table.AddColumn(#"Changed Type", "Custom", each try fxSchoolData(SchoolURL & [MIDENT]) otherwise #table({ "Column1", "Column2", "English students", "French students" }, {} ))

in
    InvokedFunction 

.
Not all returned results so there are empty tables as well.

I hope this is helpful. Here’s my sample file.
eDNA - Pulling multiple tables from a web page.pbix (36.6 KB)

@Brad.V

Here is the solution for your requirement

Step 1 Extract the required table from single URL
Step 2 Go to advanced editor and convert Power Query into function as shown below.

Step 3 Extract list of schools from Excel

Step 4 Create Custom column for URL
Custome Column

Step 5 Go to Add Column and click on Invoke Custom Function
Select the custom function and URL

Many tables are blank because the information is not available on those links.

Now you can transform the table how you want.

Repeat the same steps for other tables.

Hope this meets your requirement.

You have outlined the steps I take, thank you for that. However when I go through those steps in the Query Editor The tables are populated. When I hit the button to save and apply the module hangs trying to make a connection in the model and eventually times out. I can not figure out why the process works in the Query editor, but times out when you save the query.

Melissa your solution has worked, THANK YOU. Can you please explain what you did to create this function

Glad that worked for you @Brad.V

Sure I’ll take you through the steps I took. I’ll cover all steps so others experiencing something similar can follow along as well.

  1. loaded SchoolList query
  2. created a Parameter (type text) to hold the URL, so if that ever changes it will be easy to update.
    https://www.app.edu.gov.on.ca/eng/sift/schoolProfileSec.asp?SCH_NUMBER=
    image
  3. created a SampleQuery to see the output your M code generated
  4. cleaned up the SampleQuery slightly, so it now generates this output.
  5. Duplicated your SchoolList Query, although that is not required and opened the Advanced Editor
    1. Is basically a copy of the M code inside the SampleQuery
    1. turned it into a function called fxSchoolData that takes one parameter myURL and outputs a table
    1. passed the myURL parameter to the Web.BrowserContents expression
    1. added a Custom Column that invokes the custom function (2)
    1. passed the concatenation of the SchoolURL parameter and the text value from the MIDENT column (so a full url)
    1. added a try ~ otherwise clause that returns an empty table with the same structure if there is no result in (5). If you omit this step (the invoked function at 5) will return an error when there is no result.

I hope this is helpful.

1 Like