Do/while in Power Query to obtain all records from an api

Hello

I am attempting to connect to an API that returns records of issues for one project, but ultimately the goal is for use in different projects.
I have been partially able to do so with Power Query. The issue is that by default the web service returns only 100 records.

I have been in touch with the vendors and they have suggested that to pull all the required records incremental queries will be required. E.g. to get the next set of records after 100 such a query would be required:

https://api.bimtrackapp.co/v3/hubs/{hubId}/projects/{projectId}/issues?sort=-Number&startRecord=**101**& returnCount=100

Both 101 & 100 in the string are integers.

I believe that creating a function to replace the 101 & 100 dynamically until all records have been obtained could be done in Power Query. The closest video I have come across that could have worked is https://www.youtube.com/watch?v=vhr4w5G8bRA but the code for the function referred to is no longer available at the blog referenced.

Can one of our Power Query power users suggest a function that can achieve this?

Ps: I am unable to share the pbix file on this occasion as it would contain the company’s API key, and I am not sure how I can replicate a similar scenario.

@Mo.jo.jo ,

I don’t even think you would need to change the returnCount parameter. If 100 is the max, just leave as is and work with the Start Record parameter as a function. The simplest thing to do would be to create a one column table like this:

image

The brute force approach would be to just make this table long enough that you were sure it covered all the records, and then just iterate your API call in PQ over this table, using a function that replaces ReturnCount with the values from the table. If you’re not sure how to do this, refer to this blog/video, which uses a similar approach for iterating over a Years table:

Gil Raviv is the Michael Jordan of Power BI web scraping. If you don’t already have his book on Power Query, I would suggest buying it immediately. Other than the Definitive Guide to DAX, it’s the most used book in my data science library. Even my cat refers to it regularly…

image

Starting on page 322, he talks about how to build API web requests in Power Query.

I’m sure there is a more elegant way to terminate the iteration when it reaches the end of the data to be pulled - perhaps a try each structure. I suspect @Melissa and perhaps others will have good suggestions regarding this.

Here are a couple of other articles on Power Query Web Scraping that I think you may find useful:

I hope this is helpful.

  • Brian
1 Like

Thanks @BrianJ

I agree with your suggestion about leaving the 100 max return and focusing on the first parameter. I can work with thebrute force approach. I doubt a project can have more than perhaps 1000 issues so it should work. Needs must :upside_down_face: :blush:

And I appreciate the links and the recommendations. Your cat seems to have come to peace with its presence so it must be very well used.

I’ll work away at this and close the issue out if no other solutions are offered and/or I am able to get a successful workflow together.

@Mo.jo.jo ,

Sounds like a plan. Please keep me apprised of how it goes and if need be I think I can set up my own trial API key on that site to help you work through the code issues if necessary.

– Brian

1 Like

Hi @Mo.jo.jo ,

Can you see if something like this works for you.
Note that you’ll have to replace the YourFunction with your own custom function.

let
    // set upper limit 
    NumberOfCalls = {0..10},
    // convert initial list
    CallList = List.Transform( NumberOfCalls, each if _ = 0 then 0 else _ * 100 + 1 ),
    // iterate list, invoke custom function
    Source = List.Transform( CallList, each try {_, YourFunction(_)} otherwise null ),
    // untill you hit a null
    DoWhile = List.FirstN( Source, each _ <> null ),
    // create table
    ToTable = Table.FromRows( DoWhile, {"Page", "Content"} )
in
    ToTable

Simple example:
eDNA - Do While example.pbix (16.8 KB)
.
BTW I came across this video, that might be worth watching as well.

.
I hope this is helpful.

1 Like

@BrianJ
Thanks for the generous offer of support :pray:
Your creating a Calendar table video was very helpful.
I was able to get the hard coded approach to work. There is one step where I get errors if the row that the function works on does not exist. E.g. if I use row with value 301 as the start record position and it does not exist then rows after that throw and error. Not an issue as I then filter those errors out and expand the rest to get the required results.

I get the same error using Melissa’s solution also.
If you think the errors can be avoided please feel free to suggest something.

Ps: And I have now bought hardcopies of the Definitive Guide to Dax and Michael Jordan’s Power Query book.lol. I really hope I can make time for them. They look … huge reads.

@Melissa
Thanks for taking time out of producing your eagerly awaited course to send a solution :blush:
I tweaked your code so that the “CallList” step reads … each if _ = 0 then 0+1 else _ * 100 + 1 ) so that the first row in the table the function works on has 1 as the value.
Similar to the approach suggested by Brian I get errors where rows exist but no data is available for the api to return. As shown in image. I thought the “Source” step would ensure no errors showed up.

Do while error step

Not a big deal, as I can filter the errors out, but interested to know if you can prevent the errors from showing up in the first place?
And thanks for the link to the Radacad video, I have that queued to play tomorrow.

Many thanks to you both. You make my work life much less painful than it could be :star_struck: :star_struck:

1 Like

Just replace the 0 with a 1, I would suggest.
each if _ = 0 then 1 else _ * 100 + 1
.

Not sure, when an error is raised it will be propagated and stop the evaluation of the current expression. Will need to see the M code for YourFunction. Can you share that after you’ve replaced all confidential parts with descriptive words? Like replace the actual API key with something like EnterAPIkey and so on.

Hi @Melissa

Yes. Indeed, it is now clear that 0 + 1 is 1 … :sweat_smile:

As requested.
This is the M code I use for fxScrapeIssues, which I then add in the query/Calls table using the code you provided.

(Page as number) =>

let

Source = Json.Document(Web.Contents("https://api.bimtrackapp.co" & "/v3/hubs/{hubId}/projects/{ProjectId}/issues?sort=+Number&startRecord="&Number.ToText(Page)&"&returnCount=100", [Headers=[Authorization="Bearer {access token}"]])),

#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"Id", "ProjectId", "Title", "Number", "Description", "Group", "Author", "AssignedTo", "LastModificationAuthor", "LastModificationDate", "DueDate", "CreationDate", "ClosingDate", "DefaultViewpointId", "Disciplines", "ProjectPhase", "ProjectZone", "Priority", "Status", "Type", "Confidentiality", "NotifyList", "CustomAttributes", "CreationSource", "Viewpoints", "IsRestricted"}, {"Id", "ProjectId", "Title", "Number", "Description", "Group", "Author", "AssignedTo", "LastModificationAuthor", "LastModificationDate", "DueDate", "CreationDate", "ClosingDate", "DefaultViewpointId", "Disciplines", "ProjectPhase", "ProjectZone", "Priority", "Status", "Type", "Confidentiality", "NotifyList", "CustomAttributes", "CreationSource", "Viewpoints", "IsRestricted"}),

#"Expanded Author" = Table.ExpandRecordColumn(#"Expanded Column1", "Author", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"Author.Id", "Author.UserName", "Author.Email", "Author.FirstName", "Author.LastName", "Author.AvatarUrl"}),

#"Expanded AssignedTo" = Table.ExpandRecordColumn(#"Expanded Author", "AssignedTo", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"AssignedTo.Id", "AssignedTo.UserName", "AssignedTo.Email", "AssignedTo.FirstName", "AssignedTo.LastName", "AssignedTo.AvatarUrl"}),

#"Expanded LastModificationAuthor" = Table.ExpandRecordColumn(#"Expanded AssignedTo", "LastModificationAuthor", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"LastModificationAuthor.Id", "LastModificationAuthor.UserName", "LastModificationAuthor.Email", "LastModificationAuthor.FirstName", "LastModificationAuthor.LastName", "LastModificationAuthor.AvatarUrl"}),

#"Expanded ProjectPhase" = Table.ExpandRecordColumn(#"Expanded LastModificationAuthor", "ProjectPhase", {"Color", "Name", "Id"}, {"ProjectPhase.Color", "ProjectPhase.Name", "ProjectPhase.Id"}),

#"Expanded ProjectZone" = Table.ExpandRecordColumn(#"Expanded ProjectPhase", "ProjectZone", {"Color", "Name", "Id"}, {"ProjectZone.Color", "ProjectZone.Name", "ProjectZone.Id"}),

#"Expanded Priority" = Table.ExpandRecordColumn(#"Expanded ProjectZone", "Priority", {"Order", "Color", "Name", "Id"}, {"Priority.Order", "Priority.Color", "Priority.Name", "Priority.Id"}),

#"Expanded Status" = Table.ExpandRecordColumn(#"Expanded Priority", "Status", {"TeamsAllowedForStatus", "Color", "Name", "Id"}, {"Status.TeamsAllowedForStatus", "Status.Color", "Status.Name", "Status.Id"}),

#"Expanded Type" = Table.ExpandRecordColumn(#"Expanded Status", "Type", {"Color", "Name", "Id"}, {"Type.Color", "Type.Name", "Type.Id"}),

#"Expanded Confidentiality" = Table.ExpandRecordColumn(#"Expanded Type", "Confidentiality", {"Teams"}, {"Confidentiality.Teams"}),

#"Expanded NotifyList" = Table.ExpandRecordColumn(#"Expanded Confidentiality", "NotifyList", {"TeamsToNotify", "UsersToNotify"}, {"NotifyList.TeamsToNotify", "NotifyList.UsersToNotify"}),

#“Sorted Rows” = Table.Sort(#“Expanded NotifyList”,{{“Number”, Order.Ascending}})

in

#"Sorted Rows"

Hi @Mo.jo.jo ,

Here’s some error handling for you, LOL.
Let’s see what happens…

(Page as number) =>
let
    Source = try Json.Document(Web.Contents("https://api.bimtrackapp.co" & "/v3/hubs/{hubId}/projects/{ProjectId}/issues?sort=+Number&startRecord="&Number.ToText(Page)&"&returnCount=100", [Headers=[Authorization="Bearer {access token}"]]))
        otherwise null,
    ToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandTable = try Table.ExpandRecordColumn(ToTable, "Column1", {"Id", "ProjectId", "Title", "Number", "Description", "Group", "Author", "AssignedTo", "LastModificationAuthor", "LastModificationDate", "DueDate", "CreationDate", "ClosingDate", "DefaultViewpointId", "Disciplines", "ProjectPhase", "ProjectZone", "Priority", "Status", "Type", "Confidentiality", "NotifyList", "CustomAttributes", "CreationSource", "Viewpoints", "IsRestricted"}, {"Id", "ProjectId", "Title", "Number", "Description", "Group", "Author", "AssignedTo", "LastModificationAuthor", "LastModificationDate", "DueDate", "CreationDate", "ClosingDate", "DefaultViewpointId", "Disciplines", "ProjectPhase", "ProjectZone", "Priority", "Status", "Type", "Confidentiality", "NotifyList", "CustomAttributes", "CreationSource", "Viewpoints", "IsRestricted"})
        otherwise #table( {"Id", "ProjectId", "Title", "Number", "Description", "Group", "Author", "AssignedTo", "LastModificationAuthor", "LastModificationDate", "DueDate", "CreationDate", "ClosingDate", "DefaultViewpointId", "Disciplines", "ProjectPhase", "ProjectZone", "Priority", "Status", "Type", "Confidentiality", "NotifyList", "CustomAttributes", "CreationSource", "Viewpoints", "IsRestricted"}, {} ),
    ExpandAuthor = try Table.ExpandRecordColumn(ExpandTable, "Author", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"Author.Id", "Author.UserName", "Author.Email", "Author.FirstName", "Author.LastName", "Author.AvatarUrl"})
        otherwise #table( {"Author.Id", "Author.UserName", "Author.Email", "Author.FirstName", "Author.LastName", "Author.AvatarUrl"}, {} ),
    ExpandAssignedTo = try Table.ExpandRecordColumn(ExpandAuthor, "AssignedTo", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"AssignedTo.Id", "AssignedTo.UserName", "AssignedTo.Email", "AssignedTo.FirstName", "AssignedTo.LastName", "AssignedTo.AvatarUrl"})
        otherwise #table( {"AssignedTo.Id", "AssignedTo.UserName", "AssignedTo.Email", "AssignedTo.FirstName", "AssignedTo.LastName", "AssignedTo.AvatarUrl"}, {} ),
    ExpandLastModAuthor = try Table.ExpandRecordColumn(ExpandAssignedTo, "LastModificationAuthor", {"Id", "UserName", "Email", "FirstName", "LastName", "AvatarUrl"}, {"LastModificationAuthor.Id", "LastModificationAuthor.UserName", "LastModificationAuthor.Email", "LastModificationAuthor.FirstName", "LastModificationAuthor.LastName", "LastModificationAuthor.AvatarUrl"})
        otherwise #table( {"LastModificationAuthor.Id", "LastModificationAuthor.UserName", "LastModificationAuthor.Email", "LastModificationAuthor.FirstName", "LastModificationAuthor.LastName", "LastModificationAuthor.AvatarUrl"}, {} ),
    ExpandProjectPhase = try Table.ExpandRecordColumn(ExpandLastModAuthor, "ProjectPhase", {"Color", "Name", "Id"}, {"ProjectPhase.Color", "ProjectPhase.Name", "ProjectPhase.Id"})
        otherwise #table( {"ProjectPhase.Color", "ProjectPhase.Name", "ProjectPhase.Id"}, {} ),
    ExpandProjectZone = try Table.ExpandRecordColumn(ExpandProjectPhase, "ProjectZone", {"Color", "Name", "Id"}, {"ProjectZone.Color", "ProjectZone.Name", "ProjectZone.Id"})
        otherwise #table( {"ProjectZone.Color", "ProjectZone.Name", "ProjectZone.Id"}, {} ),
    ExpandPriority = try Table.ExpandRecordColumn(ExpandProjectZone, "Priority", {"Order", "Color", "Name", "Id"}, {"Priority.Order", "Priority.Color", "Priority.Name", "Priority.Id"})
        otherwise #table( {"Priority.Order", "Priority.Color", "Priority.Name", "Priority.Id"}, {} ),
    ExpandStatus = try Table.ExpandRecordColumn(ExpandPriority, "Status", {"TeamsAllowedForStatus", "Color", "Name", "Id"}, {"Status.TeamsAllowedForStatus", "Status.Color", "Status.Name", "Status.Id"})
        otherwise #table( {"Status.TeamsAllowedForStatus", "Status.Color", "Status.Name", "Status.Id"}, {} ),
    ExpandType = try Table.ExpandRecordColumn(ExpandStatus, "Type", {"Color", "Name", "Id"}, {"Type.Color", "Type.Name", "Type.Id"})
        otherwise #table( {"Type.Color", "Type.Name", "Type.Id"}, {} ),
    ExpandConfidentiality = try Table.ExpandRecordColumn(ExpandType, "Confidentiality", {"Teams"}, {"Confidentiality.Teams"})
        otherwise #table( {"Confidentiality.Teams"}, {} ),
    ExpandNotifyList = try Table.ExpandRecordColumn(ExpandConfidentiality, "NotifyList", {"TeamsToNotify", "UsersToNotify"}, {"NotifyList.TeamsToNotify", "NotifyList.UsersToNotify"})
        otherwise #table( {"NotifyList.TeamsToNotify", "NotifyList.UsersToNotify"}, {} ),
    SortRows = Table.Sort(ExpandNotifyList,{{"Number", Order.Ascending}})
in
    if Source = null then Source else SortRows

I hope this is helpful.

2 Likes

@Melissa :laughing:

Careful what one wishes for then.

The code results in no errors.
When expanded the empty responses from the API show up as null’s at the end, and can be filtered out.
Cleaner result.
A bit overkill, but I did ask for it. :blush:
Next time, I might just take the errors as is … :crazy_face:

Thanks as ever for your assistance.

1 Like