Extract text string from word wrap data

Hi folks,

I have a column that contains many rows. Not unusual; doh! One row contains many rows of data. The following is an example of some of the rows contained within one cell in a (.csv) text file … i’m just presuming that the text is word wrapped.

Note, in QE, when you look at the table cell, you may only see the top four lines but when you select the cell of the table directly many lines contained in the cell appear in a seperate pane below the table data.

What I’m trying to do, is extract the following string … ‘Scan Start Date : 2021/1/19’ from all the lines below.

Safe checks : yes
Optimize the test : yes
Credentialed checks : no
Patch management checks : None
Display superseded patches : yes (supersedence plugin did not launch)
CGI scanning : disabled
Web application tests : disabled
Max hosts : 100
Max checks : 5
Recv timeout : 5
Backports : None
Allow post-scan editing: Yes
Scan Start Date : 2021/1/19 17:06 AUS Eastern Standard Time
Scan duration : 186 sec

Refer attached.QE_data_extract.docx (61.0 KB)

Hope this makes sense.
TIA john

Hi @jmaikido,

Copy this into a new Blank Query.
First method fully nested, the second broken out in sections, you can turn that into a custom function.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZBPT8MwDMW/irUTHBDtJAbsNhhCHPgjCkJo3cFLzBotc6LEBcanx13HJhTl8p6f/bNns0GFHwSmIbPKMIYN5Zofo7i1+yGQRj9l+TOuE1licejJHjIcan5CMQ2skXFJay05uA+Bqeapy9HjBnIbKWXtYiF2EdoNhaO9w4Yg+nbpGKyz2l3AY8umOVaA2zvIBpkdLzVoXcaFstT8RgvAGL0zKC7wljr/q7jHb2hCr5ZF0Qt7yrOan8l8gi5OoZVeuUKziiHJYY2J9+ELorY56TCArBNFGcN7d56qkyrBJDBFIU0Ni2F5qu8SyvNxMYLJawU3mIXStpAtJgsvOnMXtm3q+ZXxYgSZzGA+/wU=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Plugin Output" = _t]),
    ChType = Table.TransformColumnTypes(Source,{{"Plugin Output", type text}}),
    ExtractText = Table.AddColumn(ChType, "Text Range", each Text.Middle([Plugin Output], Text.Length(Text.BeforeDelimiter([Plugin Output], "Scan Start Date :")), Text.Length(Text.BeforeDelimiter( Text.AfterDelimiter([Plugin Output], "Scan Start Date :"), ":"))-2 + Text.Length("Scan Start Date :")), type text),
    ExtractText2 = Table.AddColumn(ExtractText, "Text Range2", each 
        let
            SearchString = "Scan Start Date :",
            TextToSearch = [Plugin Output],
            TextStart = Text.Length( Text.BeforeDelimiter( TextToSearch, SearchString )),
            NumChars = Text.Length( SearchString ) + 11
        in
            Text.Trim( Text.Middle( TextToSearch, TextStart, NumChars )), type text
    )
in
    ExtractText2

With this result
image

.

To apply this logic using a Custom Function

(SearchFor as text, SearchIn as text) as text =>
let
    SearchString = SearchFor, //"Scan Start Date :",
    TextToSearch = SearchIn, //[Plugin Output],
    TextStart = Text.Length( Text.BeforeDelimiter( TextToSearch, SearchString )),
    NumChars = Text.Length( SearchString ) + 11
in
    Text.Trim( Text.Middle( TextToSearch, TextStart, NumChars ))

With the code above create this function Query
image

And invoke it via the ribbon go to: Add Column/Invoke Custom Function

I hope this is helpful

2 Likes

@jmaikido

Assuming all of your data is like this, and that the phrase you are trying to capture ALWAYS starts with “Scan Start Date”, you can do this easily with M Code (Power Query)

Basically, I’m extracting everything between "Scan Start Date : " and the next space, and then adding the phrase "Scan Start Date : " back to the start of the text (note that I have included a space after the colon in each instance)

eDNA - Extract from Long Text.pbix (15.0 KB)

If your case is not quite this straight-forward, please provide a pbix mockup of data that demonstrates the problem.

2 Likes

Hi @jmaikido , did the response provided by @Melissa help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

Thnx Heather, worked just fine.
Rgds, john

Hi Melissa,

That’s an excellent response altho I went with the much simpler option oferred by Heather.
Rgds,
John