Text analysis: count of hashtags and number of words in a column

@MKulaksy ,

This text is corrupted even worse than the prior. So, rather than worry about fixing that, I just downloaded the 100 most recent tweets related to Power BI through trackmyhashtag.com.

Here’s the outcome:

and here’s the M code to create the wordcount and hashtag columns:

let
        Source = Excel.Workbook(File.Contents("C:\Users\brjul\Documents\Power BI Coding\Enterprise DNA Files\Power BI Tweets.xlsx"), null, true),
        Worksheet_Sheet = Source{[Item="Worksheet",Kind="Sheet"]}[Data],
        #"Promoted Headers" = Table.PromoteHeaders(Worksheet_Sheet, [PromoteAllScalars=true]),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Tweet Id", type text}, {"Tweet URL", type text}, {"Tweet Posted Time", type datetime}, {"Tweet Content", type text}, {"Tweet Type", type text}, {"Client", type text}, {"Retweets Received", Int64.Type}, {"Likes Received", Int64.Type}, {"Tweet Location", type text}, {"Tweet Language", type text}, {"User Id", type text}, {"Name", type text}, {"Username", type text}, {"User Bio", type text}, {"Verified or Non-Verified", type text}, {"Profile URL", type text}, {"User Followers", Int64.Type}, {"User Following", Int64.Type}, {"User Account Creation Date", type datetime}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type", "Wordcount", each List.Count(     
        Text.SplitAny( 
            Text.Remove( [Tweet Content], {"#(tab)", "#(lf)"} ), "") 
            )),
        #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Tweet Content", "Wordcount"}),
        #"Added Custom2" = Table.AddColumn(#"Removed Other Columns", "Cleaned Tweet Content", each Text.Replace( [Tweet Content], "#(lf)", " ")),
        #"Added Custom3" = Table.AddColumn(#"Added Custom2", "SplitBig", each Text.SplitAny( [Cleaned Tweet Content], " ")),
        #"Added Custom1" = Table.AddColumn(#"Added Custom3", "RemoveNonHashtage", each List.Select( [SplitBig], each Text.StartsWith(_, "#"))),
        #"Extracted Values" = Table.TransformColumns(#"Added Custom1", {"RemoveNonHashtage", each Text.Combine(List.Transform(_, Text.From), """, """), type text}),
        #"Removed Other Columns1" = Table.SelectColumns(#"Extracted Values",{"Tweet Content", "Wordcount", "RemoveNonHashtage"}),
        #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"RemoveNonHashtage", "Hashtags"}}),
        #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","""","",Replacer.ReplaceText,{"Hashtags"})
    in
        #"Replaced Value"

I hope this is helpful. I’ve included my solution file, as well as the data file I used so that you can step through each of the individual steps I took in the analysis.

Power BI Tweets.xlsx (32.8 KB)

2 Likes