Problem of the Week #9 (PQ) - Oscar Winners

Hello @BrianJ,

Thank You for hosting this challenge. Really loved working on it.

Here’s the solution.

Summary

To be honest, initially I was stuck onto the result as 13 and realised that “Jane Fonda” had received an Oscar under two categories i.e. “Actress” and “Actress In A Leading Role”. Than again started with the “Grouping” process and this time got the results as 17. And started to scratch my head about how to get the results as 14 now. But ultimately was able to achieve the result as 14 (finally). :joy:

And also learned something new about this visualization and really loved to explore every feature of it.

Thanks and Warm Regards,
Harsh

1 Like

@BrianJ,

Given my lack of experience in M code, this one was a definitely a great learning experience. I know that no DAX is allowed for the solution. However, I did add a slicer to where one can toggle between seeing the visual for the 1st, 2nd, 3rd, or even 4th award (but limited to just the 14 actresses). So I did cheat and threw in a measure to change the title based on the filter.

Here was my approach:

Summary

Problem of the Week #9 - Oscar Winners (tweinzapfel).pbix (4.7 MB)

First – I created a function in Power Query to add an index column to rank rows by grouping in a column. This code was courtesy of Chris Webb.
(//Source: https://blog.crossjoin.co.uk/2015/05/11/nested-calculations-in-power-query/)

Next steps:

  1. Filtered to just Actress or Actress in a Leading Role and Winner = “True”

  2. Added a custom column to count the number of times the same actress appeared in the dataset. (Note – M code obtained via searching on the web; )

  3. I then filtered to only those actresses who had won 2 or more awards

  4. I then used the Group By function to group the data by actress.
    Pic1

  5. I then added a Custom Column that removed all the columns except the name of the actress and the year that the film was made (Note – I could have left this information for other visuals, but removed them for purposes of this exercise)

  6. I added another Custom Column that applied the custom ranking function (referenced above) to each table done in Step 5. This added the individual ranking of awards for each actress (i.e. 1st award was the earliest date, etc.)

  7. With the new column added in step 6, I then expanded every table in that new custom column. All the other columns were then removed so that I was left with a table of just actress names, the year of the film, and the number of the award.

  8. Lastly – I merged this table with the Actresses table in order to add in the URL for the image.

Additional credit is provided to Brian Grant for an excellent video on transforming nested tables as this was completely new to me and was ultimately the key for my approach:
https://www.csgpro.com/blog/transforming-nested-tables-in-power-query/

3 Likes

@tweinzapfel,

Your post absolutely made my day. If someone were to have asked me what I hoped to achieve in developing this particular problem, your post would be a perfect summary. Basically, putting people who may not have experience or comfort with M code in a position where they needed to do what @Melissa refers to as “learn by Googling”. Searching on individual functions is a terrible way to learn DAX, but a pretty awesome way to learn M (with approximately 800 separate M functions, it’s actually the only way to some extent), and with some creative googling and experimentation you can use this approach to solve a wide array of Power Query problems.

Thanks very much for your detailed writeup – it’s a great specific example of the “learn by googling” model, and one I’m sure I’ll be referring future posters to frequently. Well done, sir!

– Brian

2 Likes

Hi @Brian, thanks for the introduction of this Visual Queryon Timeline. This is my solution for this POW#9

Problem of the Week #9 - Oscar Winners - JoseBressanV2.pbix (4.8 MB)

4 Likes

@jbressan,

Wow – gorgeous job on the visualization. What program did you use to add the enhancements to the Queryon visual?

– Brian

1 Like

Hi @BrianJ, I usually use PowerPoint to work on the visual identity of the report. For the colors of this report I selected three golden contrasts with the PowerPoint eyedropper tool on the figure of the Oscar.

The PowerPoint template, I found it with the help of google search.

1 Like

All,

As always, really been enjoying going through your various solutions on this one.

Attached is my solution, including some additional material I’ll be going through in the solution video later this week.

  • Brian
Summary

Problem of the Week #9 - Oscar WInners Solution @BrianJ.pbix (5.2 MB)

Hello,

As this is my “first problem” I hope I did respect the procedures… :slight_smile:
And here is my solution:

let
    Source = Csv.Document(File.Contents("D:\EnterpriseDNA\Problem of the Week #9\the_oscar_award.csv"),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Filtered - Actresses & Winner" = Table.SelectRows(#"Promoted Headers", each ([winner] = "True") and ([category] = "ACTRESS" or [category] = "ACTRESS IN A LEADING ROLE")),
    #"Grouped Rows" = Table.Group(#"Filtered - Actresses & Winner", {"name"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"AllRecords", each _, type table}}),
    #"Filtered - min 2 awards" = Table.SelectRows(#"Grouped Rows", each ([Count] <> 1)),
    #"Invoked - fnAddRank" = Table.TransformColumns(#"Filtered - min 2 awards", {"AllRecords", each fnAddRank(_)}),
    #"Expanded AllRecords" = Table.ExpandTableColumn(#"Invoked - fnAddRank", "AllRecords", {"year_ceremony", "Index"}, {"AllRecords.year_ceremony", "AllRecords.Index"}),
    #"Filtered - second award" = Table.SelectRows(#"Expanded AllRecords", each ([AllRecords.Index] = 2))
in
    #"Filtered - second award"

And the function fnAddRank

(srcTable as table) as table =>
let
    Source = srcTable,
    #"Sorted Rows" = Table.Sort(Source,{{"year_ceremony", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type)
in
    #"Added Index"

Problem of the Week #9 - Oscar Winners - LucianC.pbix (4.6 MB)

2 Likes

@Lucian,

Nicely done! Great to have you aboard - thanks for participating!

  • Brian
1 Like

Here is my solution. I must admit this a new visual to me, but I think I will be using it again.

Steps taken:

  • Imported CSV
  • Promoted headers
  • Filtered only ‘Actress’ and ‘Actress in Leading Role’
  • Grouped by row count and max year
  • Filtered where count >1
  • Added date with a calculated column of 01/01/[year]
  • Change data type to date

let
Source = Csv.Document(File.Contents(“C:\Users\chris\OneDrive - Logical BI\IT\Power BI\Enterprise DNA\Problem of the Week\the_oscar_award.csv”),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#“Promoted Headers” = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#“Filtered Actresses” = Table.SelectRows(#“Promoted Headers”, each ([winner] = “True”) and ([category] = “ACTRESS” or [category] = “ACTRESS IN A LEADING ROLE”)),
#“Grouped Rows” = Table.Group(#“Filtered Actresses”, {“name”}, {{“Count”, each Table.RowCount(_), Int64.Type}, {“LastYear”, each List.Max([year_ceremony]), type nullable text}}),
#“Filtered Multiple Winners” = Table.SelectRows(#“Grouped Rows”, each [Count] > 1),
#“Added Date” = Table.AddColumn(#“Filtered Multiple Winners”, “Date”, each “01/01/” & [LastYear]),
#“Changed Date Data Type” = Table.TransformColumnTypes(#“Added Date”,{{“Date”, type date}})
in
#“Changed Date Data Type”

Problem of the Week #9 - Oscar Winners.pbix (4.7 MB)

@logicalbi,

Thanks for participating! Very close, but not 100% right. The hitch is that Kathryn Hepburn has 4 wins, and so your MAX expression grabs the date of her last win, not her 2nd.

@jbressan - going back and reviewing, your entry also has everything correct except the Hepburn date as well.

  • Brian

I should of read it properly. I’ve made a change using an expanded table and creating a calculated index column within it. Using the index value of 1 seems to return the 2nd value. I guess this could be affected by how the data is sorted but looks ok in this case

Problem of the Week #9 - Oscar Winners.pbix (4.7 MB)

1 Like

@logicalbi,

:+1:. That’s got it.

FYI – if you want the second award to correspond to 2, just a minor tweak in your custom column:

Table.AddIndexColumn([Table], “Index”, 1, 1)

  • Brian

Thanks @BrianJ.

I fixed it and updated the PBIX file.

1 Like

Hi , Sorry i’m late :slightly_smiling_face: , but give it try to get some ideas

1 Like

All,

As always, thanks to all who participated formally by posting solutions, but I also hope there are a lot of you taking a swing at these problems, even if you are not posting. The solution video for this problem is now up on YouTube, and I will also walk through the key points below.

In putting this problem together, I had three primary objectives:

1. Learn/practice with the Group By/All Rows construct - as I’ve talked about at length in this recent video and in a number of forum solutions, this combination is incredibly powerful and allows you to simply solve a wide range of grouping/aggregation-related problems with little or no DAX.

2. Develop some comfort with writing basic M code - I’ve seen a couple of recent videos (definitely not Enterprise DNA videos) arguing that you don’t really need to learn M code at all. To me this is completely wrongheaded . Gil Raviv, one of the foremost experts on Power Query, estimates that approximately 40% of data prep problems can be solved using the PQ UI only, whereas over 80% (!) can be solved with some basic knowledge of creating custom columns using M. This represents a HUGE return on investment – being able to increase your capabilities in a key pillar by over 100% by learning techniques that are probably far less difficult than you might think. Thus, I wanted to use this problem to frame a generalizable approach you can use to start incorporating basic M code into your solutions with pretty minimal training or knowledge in this area.

3. Introduce the Queryon Timeline custom visual – I think this is a great visual, that can be used to dramatically present data in a way not possible using Power BI’s native capabilities or even other custom visuals

In tackling a multifaceted problem like this, I often find the best way to start is to take it out of the context of Power BI or even programming entirely. Thus, imagine if instead of providing a CSV file, I had just provided you with 93 separate slips of paper, each with a Best Actress winner and year on it. That then becomes conceptually quite a simple problem to solve. I imagine most people’s solutions would look something like this:

  1. Sort slips into groups by actress

  2. Throw out any group with just one slip in it

  3. For the remaining groups, sort slips in each group by ascending date

  4. Choose the second occurring date in each group

Once we have a conceptual solution like this, the rest is just a matter of figuring out how to make Power Query execute each of the steps.

First, use Group By with All Rows to count the number of Best Actress awards per actress:

image

This grouping function returns a table that looks like this:

image

Now, it’s a simple matter to filter on count and remove all awardees with a count of one:

image

and then sort the results by name and then by year:

this puts us 75% of the way home in implementing our conceptual model. The last step is to regroup by name and develop an index for each group, so that we can pull out the second record in each group as defined by the year. To do so, if we don’t know the M function to do that, we can do what Enterprise DNA Resident Power Query Master @Melissa refers to as “Learn by Googling”. This approach works well for Power Query/M because: a) unlike DAX, the names of most M functions quite literally describe what the function does; and b) also unlike DAX, you can often write a single line of M code in isolation and have it perform a very useful function.

The most challenging task in this is often figuring out what term(s) to search on, but after a while you will begin to learn certain phrases (e.g. “nested table”) that help you better describe what you’re looking for and return the results you want. Three very fruitful sources for this “Learn by Googling ” approach are the Microsoft M Reference Guide, for which I have developed a handy External Tool to bring you there directly from within Power BI, and the Microsoft Community, as well as our own forum here.

A couple of searches likely will bring you to the Table.AddIndexColumn() function, which you can implement quite easily as a custom column:

image

once you have that grouped index created, you can filter it down to 2 only to choose the second win year for each actress, which completes the implementation of our conceptual model and solves the problem (but for a few minor remaining tasks related to the visual):

image

(Note: If you want to see a great example of someone without any real M experience effectively working through this approach, check out eDNA member @tweinzapfel’s outstanding solution writeup to this problem above.)

In terms of those minor tasks, it’s just a matter of merging the image URL field from the Actresses table, and turning the year into a date using Column by Examples. Here’s what the final prepped table looks like:

image

Now we just have to make sure to set the URL field to type Image URL, and then just drop the primary fields into the proper well in the Queryon Timeline visual:

image

There are a number of remaining formatting tweaks you will need to implement to get the visual looking exactly right. I have posted my solution file earlier in this thread so if you have questions about those formatting options you can look them up in the PBIX file.

I really hope you enjoyed this problem and added a new tool or two to your toolbox. Particularly if you’re not familiar and/or comfortable with M code, hopefully this demystified it a bit and encouraged you to give it a try in your own problems, using the general framework laid out above.

Thanks for participating! We will be back next week with an interesting new DAX problem.

• Brian

7 Likes

Great initiative everyone.

Love going thruogh the youtube video Brian.

So many great techniques you covered on it.

2 Likes

@sam.mckay,

Thanks very much for the feedback, I especially enjoyed putting this video together, because the techniques are so powerful and broadly applicable to so many different scenarios, but at the same time are pretty easy to learn.

  • Brian
2 Likes

I have only now seen this context.
I would like to add even if out of time my idea for the solution via GUI.
Problem of the Week #9 - Oscar Winners.pbix (4.7 MB)

2 Likes

Here goes my solution for the Week-09 and I just couldn’t express the amount of learning I gained till now from the POTW challenge. This is an absolute gem.

Couldn’t thank you more @BrianJ for bringing these challenges every other week.

Really loved the visualization of @jbressan and thanks for the PPT background.

Would love to know for any other such sources where we can download amazing templates for visualization… of course, we got Analyst Hub but, we always need more :smile: :stuck_out_tongue:

Thank you guys :slight_smile:

2 Likes