Many 2 many and last value of a group

I have a couple of tables, one with a list of repositories which key is the compound of scope, org and repos name and some total, and a user table listing users of each repo. Users have a number indicating the order of the users for that repo (a repo can have then multiple users and a user can be in multiple repos).
I need to get for each repo the latest added user (the user with the higher user order).
I tried using a virtual relation between the two tables using the key as a link, but I cannot get the latest user for that tuple.
May you help?

Thanks in advance

Roberto

db.xlsx (17.0 KB) many2many.pbix (29.1 KB)

Hi @Roberto

I created a calculated column. PFB the logic.

LastUser =
VAR _key = Repositories[Key]
VAR _Tab =
FILTER ( Users, Users[Key] = _key )
RETURN
CALCULATE ( MAX ( Users[User] ), TOPN ( 1, _Tab, Users[User order], DESC ) )

many2many.pbix (30.6 KB)

1 Like

@Roberto,

Here is my solution using a measure, not a calculated column. Both solutions are viable, but I prefer a measure over a calculated column as often as possible.

Thanks
Jarrett

Hi @Roberto,

Option 1. Power Query solution

Option 2. Measure
Because you’ve used dimensions from both tables this measure accounts for that.

Last User v2 = 
VAR MaxValue = CALCULATE( MAX( Users[User order] ), ALLEXCEPT( Users, Users[Key] ))
VAR vTable =
ADDCOLUMNS(
    SUMMARIZE( Repositories, Repositories[Key] ),
    "@LastUser", CALCULATE( MAX( Users[User] ), 
        FILTER( ALL( Users ),
            Users[Key] = EARLIER( Repositories[Key] ) &&
            Users[User order] = CALCULATE( MAX( Users[User order] ), ALLEXCEPT( Users, Users[Key] )))
        )
    )
RETURN

IF( ISINSCOPE( Users[Key] ),
    CALCULATE(
        MAX( Users[User] ),
        FILTER( ALLEXCEPT( Users, Users[Key] ), Users[User order] = MaxValue )
    ),
    MAXX( vTable, [@LastUser] )
) 

.

Here’s your sample file. If you change the FileLocation parameter the queries will be restored.
eDNA - many2many.pbix (38.2 KB)

I hope this is helpful

1 Like

This is amazing!
1 question, 4 solutions (and counting…)

:smiley:

3 Likes

Yes @Melissa

This way we can learn more.

Let me know which one is best for this scenario.

I prefer calculated column for dimensions( They can use as a slicer in the report).
If it is measure they can’t use in slicer.

What do you think ?

If it’s a Dimension/Attribute then go for the Power Query solution.
From a performance standpoint, always try to push that as far back to the source as you can.

1 Like

Pushing back is good idea. But here we are merging two tables.

In real time(My personal experience) when we are merging tables (large datasets) will face some performance issues.

Hi @Rajesh,

I understand and if there is a performance issue with the Merge the process can be optimized.
There’s a multitude of ways to solve this in Power Query - I guess “cross that bridge when we get there…”

@Rajesh thanks for your elegant solution. My TOPN grasp still needs attention.
So I went through a video where @sam.mckay provides a clear explanation.
Thanks again!

@JarrettM thanks for your kind replay. I do not get how it works (it actually does :slightly_smiling_face:).
Does the function lastnonblank relies on some order? User order can be random

This is a very lively community!
In the meanwhile I found this post where @sam.mckay shows these 2 techniques
Calculate the most recent value with DAX in Power BI - #2 by sam.mckay

In this scenario, slicing is required and it will make building the report easier (focus is on users).
I’ll try both in production where data is massive. Unfortunately, I do simple things, but being the size of data always overwhelming then even if approaches provide the same result, some are not applicable when the going gets tough.

Thanks @Melissa! Your DAX solution will take me some time to be understood. I’ll try also the PQ as well, but as mentioned with large data set I found Power Query is not always the most efficient solution

Hi @Roberto,

If you provide a small sample, column headers and query names are more important than the actual data, I’m confident we can improve performance.