Find a list of values in a column when the two tables are not related. A Search for values?

chad.sharpe · November 27, 2023, 6:51pm

I have a large table with a “Short Description” for an application. I then have another list of 50 plus key words. I want to add a new column to the large table calling out the key word when it appears. If no key word appears then it can say “other” or “not found”. I’m not sure how to do this when i can’t create a relationship between both tables?

The output should look like

eDNA Lookup Value.pbix (18.6 KB)

VilmarSchi · November 27, 2023, 11:57pm

This alternative helps you?

Medida = 
SWITCH (
    TRUE (),
    SEARCH ( "Dev", MAX ( 'Short Description'[Short Description] ), 1, BLANK () ) > 0, "Dev",
    SEARCH ( "Prod", MAX ( 'Short Description'[Short Description] ), 1, BLANK () ) > 0, "Prod",
    "Other"
)

chad.sharpe · November 28, 2023, 2:13pm

Switch True() is my backup option. Not the most ideal solution.

Switch True() is a static measure. I would need to update it manually everytime the value changed. As well there are over 50 values to find so the measure it’s self would now be 50 plus rows.

I was hoping there was a more dynamic PowerQuery option to take into account the column in the look up table?

jamie.bryan · November 28, 2023, 3:19pm

Hi @chad.sharpe

See if this works for you, most likely not the best but maybe gives you a jumping off point.

I’ve got a table of the lookups and a table of the data.

Finds the index of the first lookup word that matches, if any
Generate a new list, getting just the word at the index found
Convert this list back to text, replacing with “other” if there are no matches

Few caveats:

Isn’t accounting for different cases, will need to convert all to lower or upper first before doing comparisons if you’re expecting differences.
It will return the first word that matches, e.g. using your example “dev prod” will return “dev”

chad sharpe lookups.pbix (19.5 KB)

HufferD · November 29, 2023, 10:17am

ANSWER:

I provide two approaches below, a calculated column—like you requested—and a measure.

Note, too, you don’t need a relationship between the lookup table column and the target table column to perform cross-table computations.

In the calculated column approach, the formula operates within the row context of the table where the column is added. It processes each row individually, checking against all the keywords in the lookup table. This row-by-row operation doesn’t require a formal relationship because the logic is applied at the individual row level, not across the tables.

The use of SUMX in both the calculated column and the measure allows the formula to “scan” through the lookup table for each value in the main table. SUMX iterates over a table (in this case, the lookup table) and performs the specified calculation (checking if the keyword is contained in the description). This iteration is independent of any formal relationship between the tables.

In scenarios where the relationship between tables is not straightforward or where creating a relationship could overly complicate the data model, these kinds of formulas provide flexibility. They allow for cross-table computations without altering the underlying data model with relationships that might not be needed for other analyses or might introduce unnecessary complexity.

Finally, measures in Power BI are dynamic and calculate results based on the current context, which is defined by filters, slicers, or the specific data points selected in a report. This context-driven calculation allows measures to compare or reference different tables without needing a direct relationship, as they’re not limited to predefined table relationships.

Approach 1: Calculated Column

containsKeyword COL = 
VAR CurrentValue = LOWER('Short Description'[Short Description])
RETURN
IF(
    SUMX(
        'Look up Table'
        , IF(
            CONTAINSSTRING(
                CurrentValue
                , LOWER('Look up Table'[Look up Column])
            )
            , 1
            , 0
        )
    ) > 0
    , "Contains Keyword"
    , "Not Found"
)

This approach involves adding a new column to your main table (Short Description table). The DAX formula iterates through each row in this table and checks if the Short Description contains any of the keywords listed in the Look up Table.

This method is best when you need a static column that directly shows whether each row contains any of the keywords. It’s particularly useful for creating reports or visuals where this binary indication is needed on a row-by-row basis.

Identifying the keyword is only slightly more complicated:

Identify Keyword COL = 
VAR CurrentValue = LOWER('Short Description'[Short Description])
RETURN 
    IF(
        SUMX(
            'Look up Table', 
            IF(
                CONTAINSSTRING(CurrentValue, LOWER('Look up Table'[Look up Column]))
                , 1
                , 0
            )
        ) > 0
        , CONCATENATEX(
            FILTER(
                'Look up Table',
                CONTAINSSTRING(CurrentValue, LOWER('Look up Table'[Look up Column]))
            )
            , 'Look up Table'[Look up Column]
            , ", "
        )
        , "Not Found"
    )

Approach 2: Measure (`containsKeyword Measure`)

containsKeyword Measure = 
VAR SelectedValue = LOWER(SELECTEDVALUE('Short Description'[Short Description]))
RETURN
IF(
    SUMX(
        'Look up Table'
        , IF(
            CONTAINSSTRING(
                SelectedValue
                , LOWER('Look up Table'[Look up Column])
            ) 
            , 1
            , 0
        )
    ) > 0
    , "Contains Keyword"
    , "Not Found"
)

This approach creates a measure that can be used in visuals where the context is defined by the user’s interaction with the report, such as selecting a specific value or filtering.

This approach is ideal for interactive reports where the user might be selecting or filtering specific entries. It allows for a dynamic analysis of whether the selected descriptions contain any of the keywords. The measure works best when a single value from the Short Description is in context. This is particularly important in visuals like tables with multiple rows, where the measure’s behavior might be less intuitive.

And, again, identifying the keyword is only slightly more complicated:

Identify Keyword Measure = 
VAR __SelectedValue = LOWER(SELECTEDVALUE('Short Description'[Short Description], "None Selected"))
VAR __result = 
    IF(
        __SelectedValue = "None Selected"
        , "Not Found"
        , IF(
            SUMX(
                'Look up Table', 
                IF(
                    CONTAINSSTRING(__SelectedValue, LOWER('Look up Table'[Look up Column]))
                    , 1
                    , 0
                )
            ) > 0
            , CONCATENATEX(
                FILTER(
                    'Look up Table',
                    CONTAINSSTRING(__SelectedValue, LOWER('Look up Table'[Look up Column]))
                )
                , 'Look up Table'[Look up Column]
                , ", "
            )
            , "Not Found"
        )
    )
RETURN
IF(HASONEVALUE('Short Description'[Short Description]),__result)

For the measures I added a HASONEVALUE check. that this is used to ensure the measure works correctly in contexts where a single Short Description is selected, as measures can behave unexpectedly in visuals with multiple values.

Is this scalable? I don’t know. This could be computationally intensive especially on large datasets. More attractive alternatives exist if the datasets are large including preprocessing in power query or performing these steps in R or Python.

Find a list of values in a column when the two tables are not related. A Search for values?

Approach 1: Calculated Column

Approach 2: Measure (containsKeyword Measure)

Approach 2: Measure (`containsKeyword Measure`)