Find Top N terms in a text string

@Gaelan ,

OK, I finally cracked this one, but it did not surrender easily. As I mentioned above, this was difficult because separating text into columns is two clicks in Power Query, but difficult in DAX. The best way I’ve found to do it is using a trick from DAX wizard @AntrikshSharma , where you replace the spaces in the text with the vertical bar symbol ("|") and then use the PATHLENGTH and PATHITEM series of functions to manipulate the string into a tabular column.

Once you do this, you can create a one column virtual table of the Top N most used words, pulling the parameter from the top N slider, and another one column virtual table of the “split by delimited” movie title words using the approach above.

You can then take the intersection of these two tables and count the rows in them - every movie for which this count > 0 will have a word in it from the dynamic top N list.

Here’s the measure that does the heavy lifting, which you place in the filter pane on the visual and set equal to 1.

Top Word in Film Path =

VAR vTopNWords =

FILTER(

    ALL(TopWords[Top Words] ),

    [RankTopWords] <= [Top N Selection Value]

)

VAR vFilmPathWords =

ADDCOLUMNS(

    CROSSJOIN(

        VALUES(Movies[Film Path] ),

        VALUES( 'Max Path Index'[Value] )

    ),

    "@FilmPathWords", PATHITEM( Movies[Film Path], 'Max Path Index'[Value])

       

)

VAR Intersection =

COUNTROWS(

    INTERSECT(

        SELECTCOLUMNS( vTopNWords, "@Col1", [Top Words] ),

        SELECTCOLUMNS( vFilmPathWords, "@Col2", [@FilmPathWords]    )

    )

)

VAR Result =

IF( Intersection > 0, 1, 0 )

RETURN Result

And here’s what it looks like all put together:

Full solution file attached. I hope this is helpful.

Fun problem!

All the best,

4 Likes