How to get data from a webiste HTML then iterate through all the location IDs

Hi all,

I am using freely available CQC data to try and map all brain injury services in the UK. Using this spreadsheet I have a list of all locations in England from the CQC with inspection ID Processing: 03_July_2023_HSCA_Active_Locations.xlsx… On the spreadsheet this does not give the specific service that the location offers. Brain injury. On the CQC website this is written on the web page inspection summary about our service see example of a location I found devonshire house. https://www.cqc.org.uk/location/1-9943565245/inspection-summary#overall

I have attempted to create a function which iterates through the Location ID and then brings in the HTML code then filter out if this contains brain but the data is too big I think for this to be loaded into the model. Please see pbix file
BIR services.pbix (138.1 KB).

In essence what I would want is a list of location ID and then if on the website this contains the word brain on the website.

Any questions please ask

Dean

Hi @rocky.rath Dean,

Thank you for starting a conversation on the Community Forum.

Your effort is commendable. I am unfamiliar with the CQC Taxonomy and Location Details.

I looked through the powerquery-m code and thought it must be painful.

It got me thinking to look what their Web API would offer. After looking, it appears promising; however, I am unsure how the matching summary report you used as the source translates cleanly to a matching API request. Have you thought about using their API to retrieve the information you need to be formatted as JSON? I spent just a few minutes and pulled back some data and thought that the facility codes, names, and such would make more immediate sense to you in testing this out.

URLS:
Using CQC data

:: API Getting Started

https://www.cqc.org.uk/location/1-9943565245/inspection-summary

API reference for location/inspection-areas:
https://api.cqc.org.uk/public/v1/locations/{location_id}/inspection-areas

I tried to use the pattern from your URL:
https://api.cqc.org.uk/public/v1/locations/1-9943565245/inspection-summary

But it seemed to fail. Maybe /inspection-summary is a blend of a number of requests.

Hi thanks for taking the time to look at this, I had thought about using the API but the taxonomy doesn’t work for the specific elements of the service I need. Brain Injury is not a regulated activity or a service level within the framework. All I could do is closely match this so for instance it appears brain injuries could be classified as Sensory impairment. But the API doesn’t break it down to that level either.

That is why I started to think of a more creative way of trying to get this information. On the website it appears if a service has had an inspection it tells you what the service it is that the location provides and that is the detail that I think would be useful.

Dean