Text analysis: count of hashtags and number of words in a column

Hi,
I have a column with text comments of different length and I want to evaluate the following things:

  • number of words in each comment
  • which hashtags are used with which frequency
    Can this be done with DAX formulas?

Best regards,
Mariia

@MKulaksy ,

It’s certainly doable in DAX, but much better suited to Power Query. If you can please provide a sample I’d be glad to work through the code.

  • Brian

Hi @MKulaksy, I noticed you didn’t provide a PBIX file. Providing one will help users and experts find a solution to your inquiry faster and better.

A perfect initial question includes all of the following:

  • A clear explanation of the problem you are experiencing
  • A mockup of the results you want to achieve
  • Your current work-in-progress PBIX file
  • Your underlying data file (to allow us to go into Power Query if necessary to transform your data and/or data model – often DAX questions really end up being data modeling solutions)

Check out this thread on Tools and Techniques for Providing PBIX Files with Your Forum Questions

Not completing your data may sometimes cause delay in getting an answer.

Hi @MKulaksy, we’ve noticed that no response has been received from you since July 3rd.

We are waiting for the masked demo pbix file and any other supporting links and details.

In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved.

Hi,
Here is the sample file with some tweets data. What I would like to know is:

  • the ranking of top hashtags and the frequency they are used
  • the average length of the post (number of words)
    Thanks!
    Best regards,
    Mariia
    Sample text analysis.pbix (7.0 MB)

@MKulaksy ,

Thanks for the file. Unfortunately, most of the tweets appear to have been corrupted, and hashtags are not retrievable.

  • Brian

Hi Brian,
And does it work like this?
Best regards,
Mariia
Sample text analysis.pbix (81.9 KB)

@MKulaksy ,

This text is corrupted even worse than the prior. So, rather than worry about fixing that, I just downloaded the 100 most recent tweets related to Power BI through trackmyhashtag.com.

Here’s the outcome:

and here’s the M code to create the wordcount and hashtag columns:

let
        Source = Excel.Workbook(File.Contents("C:\Users\brjul\Documents\Power BI Coding\Enterprise DNA Files\Power BI Tweets.xlsx"), null, true),
        Worksheet_Sheet = Source{[Item="Worksheet",Kind="Sheet"]}[Data],
        #"Promoted Headers" = Table.PromoteHeaders(Worksheet_Sheet, [PromoteAllScalars=true]),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Tweet Id", type text}, {"Tweet URL", type text}, {"Tweet Posted Time", type datetime}, {"Tweet Content", type text}, {"Tweet Type", type text}, {"Client", type text}, {"Retweets Received", Int64.Type}, {"Likes Received", Int64.Type}, {"Tweet Location", type text}, {"Tweet Language", type text}, {"User Id", type text}, {"Name", type text}, {"Username", type text}, {"User Bio", type text}, {"Verified or Non-Verified", type text}, {"Profile URL", type text}, {"User Followers", Int64.Type}, {"User Following", Int64.Type}, {"User Account Creation Date", type datetime}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type", "Wordcount", each List.Count(     
        Text.SplitAny( 
            Text.Remove( [Tweet Content], {"#(tab)", "#(lf)"} ), "") 
            )),
        #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Tweet Content", "Wordcount"}),
        #"Added Custom2" = Table.AddColumn(#"Removed Other Columns", "Cleaned Tweet Content", each Text.Replace( [Tweet Content], "#(lf)", " ")),
        #"Added Custom3" = Table.AddColumn(#"Added Custom2", "SplitBig", each Text.SplitAny( [Cleaned Tweet Content], " ")),
        #"Added Custom1" = Table.AddColumn(#"Added Custom3", "RemoveNonHashtage", each List.Select( [SplitBig], each Text.StartsWith(_, "#"))),
        #"Extracted Values" = Table.TransformColumns(#"Added Custom1", {"RemoveNonHashtage", each Text.Combine(List.Transform(_, Text.From), """, """), type text}),
        #"Removed Other Columns1" = Table.SelectColumns(#"Extracted Values",{"Tweet Content", "Wordcount", "RemoveNonHashtage"}),
        #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"RemoveNonHashtage", "Hashtags"}}),
        #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","""","",Replacer.ReplaceText,{"Hashtags"})
    in
        #"Replaced Value"

I hope this is helpful. I’ve included my solution file, as well as the data file I used so that you can step through each of the individual steps I took in the analysis.

Power BI Tweets.xlsx (32.8 KB)

2 Likes

Hi @MKulaksy, we’ve noticed that no response has been received from you since July 3rd. We just want to check if you still need further help with this post? In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved.

Hi @MKulaksy, due to inactivity, a response on this post has been tagged as “Solution”.

If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box.

Thanks @BrianJ, I have a similar challenge, how would go about counting each individual hashtags in the hashtag column (as there are sometimes multiple in a cell) , such that you then use it in a bar chart which shows the frequency of each hashtag.

PS: I opened a new topic : New topic

@Ikay ,

Here’s one way to do it by adding a custom column:

I hope this is helpful.

Thanks for opening a new thread – makes these questions easier to spot…

  • Brian

Thanks Brian for the prompt reply, its more about counting individual hashtags in the created Hashtag column, so in your example, one of the cell in that column had “#PowerBI, #KPIs”.

So I’m wanting to count how many #PowerBI in that column and the count of all the other hashtags as well.
I’ve had it solved now anyway. Thanks alot