This text is corrupted even worse than the prior. So, rather than worry about fixing that, I just downloaded the 100 most recent tweets related to Power BI through trackmyhashtag.com.
Here’s the outcome:
and here’s the M code to create the wordcount and hashtag columns:
let
Source = Excel.Workbook(File.Contents("C:\Users\brjul\Documents\Power BI Coding\Enterprise DNA Files\Power BI Tweets.xlsx"), null, true),
Worksheet_Sheet = Source{[Item="Worksheet",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Worksheet_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Tweet Id", type text}, {"Tweet URL", type text}, {"Tweet Posted Time", type datetime}, {"Tweet Content", type text}, {"Tweet Type", type text}, {"Client", type text}, {"Retweets Received", Int64.Type}, {"Likes Received", Int64.Type}, {"Tweet Location", type text}, {"Tweet Language", type text}, {"User Id", type text}, {"Name", type text}, {"Username", type text}, {"User Bio", type text}, {"Verified or Non-Verified", type text}, {"Profile URL", type text}, {"User Followers", Int64.Type}, {"User Following", Int64.Type}, {"User Account Creation Date", type datetime}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Wordcount", each List.Count(
Text.SplitAny(
Text.Remove( [Tweet Content], {"#(tab)", "#(lf)"} ), "")
)),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Tweet Content", "Wordcount"}),
#"Added Custom2" = Table.AddColumn(#"Removed Other Columns", "Cleaned Tweet Content", each Text.Replace( [Tweet Content], "#(lf)", " ")),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "SplitBig", each Text.SplitAny( [Cleaned Tweet Content], " ")),
#"Added Custom1" = Table.AddColumn(#"Added Custom3", "RemoveNonHashtage", each List.Select( [SplitBig], each Text.StartsWith(_, "#"))),
#"Extracted Values" = Table.TransformColumns(#"Added Custom1", {"RemoveNonHashtage", each Text.Combine(List.Transform(_, Text.From), """, """), type text}),
#"Removed Other Columns1" = Table.SelectColumns(#"Extracted Values",{"Tweet Content", "Wordcount", "RemoveNonHashtage"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"RemoveNonHashtage", "Hashtags"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","""","",Replacer.ReplaceText,{"Hashtags"})
in
#"Replaced Value"
I hope this is helpful. I’ve included my solution file, as well as the data file I used so that you can step through each of the individual steps I took in the analysis.
- Brian
eDNA Forum - Wordcount and Hashtags from Tweets Solution.pbix (39.5 KB)
Power BI Tweets.xlsx (32.8 KB)