Power Query Workout 06 - Filter Relevant Lines

Difficulty Rating:


or

Introduction:

You got Plan and FC data and you need to eliminate unnecessary lines:


In addition your boss asks you to make clear which line was Plan data and which data is from FC. Furthermore you should try to rearrange volumes and sales in two separate columns.
Power Query Workout6 Realign the Data

Your task:
Your main task is to set up the filtering without manually selecting items. Use 2024 as your anchor year and make sure that your solution works next year, by only changing to 2025 as anchor year.
If possible bring also some more clarity to the data by rearranging the data. You need to consider that the naming for Plan and FC data is for whatever reasons different.

(NOTE: For more advanced users, attempt to minimize the number of manual changes using relevant M code functions)
6 Filter Relevant Lines.xlsx (55.7 KB)

This workout was posted on Monday May 1, 2023, and the author’s solution will be posted on Sunday May 7, 2023.

2 Likes

Hello! These workouts are addictive :joy:
Here is my attempt:

Code below.
Thanks!

let
Source = ExcelW6,
#“Added Year” = Table.SplitColumn(Source, “Attribute”, Splitter.SplitTextByCharacterTransition((c) => not List.Contains({“0”…“9”}, c), {“0”…“9”}), {“Attribute”, “Year”}),
NextYear=Text.From(Date.Year(DateTime.LocalNow())+1),
NextYear2=Text.From(Date.Year(DateTime.LocalNow())+2),
NextYear3=Text.From(Date.Year(DateTime.LocalNow())+3),
NextYear4=Text.From(Date.Year(DateTime.LocalNow())+4),
NextYear5=Text.From(Date.Year(DateTime.LocalNow())+5),
#“Filtered Rows” = Table.SelectRows(#“Added Year”, each
(Text.Contains(Text.From([Year]), NextYear) and Text.Contains([Attribute] , “FC Revenue”)) or
(Text.Contains(Text.From([Year]), NextYear) and Text.Contains([Attribute] , “FC Volume”)) or
(Text.Contains(Text.From([Year]), NextYear) and Text.Contains([Attribute] , “Sales”)) or
(Text.Contains(Text.From([Year]), NextYear) and Text.Contains([Attribute] , “Volume p.A.”)) or
(Text.Contains(Text.From([Year]), NextYear2) and Text.Contains([Attribute] , “Sales”)) or
(Text.Contains(Text.From([Year]), NextYear2) and Text.Contains([Attribute] , “Volume p.A.”)) or
(Text.Contains(Text.From([Year]), NextYear3) and Text.Contains([Attribute] , “Sales”)) or
(Text.Contains(Text.From([Year]), NextYear3) and Text.Contains([Attribute] , “Volume p.A.”)) or
(Text.Contains(Text.From([Year]), NextYear4) and Text.Contains([Attribute] , “Sales”)) or
(Text.Contains(Text.From([Year]), NextYear4) and Text.Contains([Attribute] , “Volume p.A.”)) or
(Text.Contains(Text.From([Year]), NextYear5) and Text.Contains([Attribute] , “Sales”)) or
(Text.Contains(Text.From([Year]), NextYear5) and Text.Contains([Attribute] , “Volume p.A.”))
),
#“Added Source” = Table.AddColumn(#“Filtered Rows”, “Source”, each if Text.Contains([Attribute],“FC”,Comparer.OrdinalIgnoreCase) then “FC” else if Text.Contains([Attribute],“Plan”,Comparer.OrdinalIgnoreCase) then “Plan” else null, type text),
#“Added Attribute for Pivoting” = Table.AddColumn(#“Added Source”, “Pivot”, each if Text.Contains([Attribute],“Volume”,Comparer.OrdinalIgnoreCase) then “Volume” else if (Text.Contains([Attribute],“Sales”,Comparer.OrdinalIgnoreCase) or Text.Contains([Attribute],“Revenue”,Comparer.OrdinalIgnoreCase)) then “Sales” else null, type text),
#“Removed Columns” = Table.RemoveColumns(#“Added Attribute for Pivoting”,{“Attribute”}),
#“Pivoted Column” = Table.Pivot(#“Removed Columns”, List.Distinct(#“Removed Columns”[Pivot]), “Pivot”, “Value”, List.Sum)
in
#“Pivoted Column”

2 Likes

Adjunto mi query… No me quedó muy claro el problema, espero que se apegue a lo solcitado. :crossed_fingers:

let
Source = Excel.CurrentWorkbook(){[Name=“Data”]}[Content],
Año = 2024,
Split = Table.SplitColumn(Source, “Attribute”, Splitter.SplitTextByCharacterTransition((c) => not List.Contains({“0”…“9”}, c), {“0”…“9”}), {“Attribute.1”, “Year”}),
Split2 = Table.TransformColumns(Split, {“Attribute.1”, each
let
a = Text.Split(, " "),
b = List.Select(a, each not Text.Contains(
, “p.A”)),
c = if Text.Contains(b{1}, “Plan”) then List.Reverse(b) else b,
d = Table.FromColumns({{c{0}}, {c{1}}}, {“Source”, “Data”})
in d}),
FiltroAño = Table.SelectRows(Split2, each Number.From([Year]) >= Año and Number.From([Year]) <= Año + 4),
Expand = Table.ExpandTableColumn(FiltroAño, “Attribute.1”, {“Source”, “Data”}, {“Source”, “Data”}),
Trim = Table.TransformColumns(Expand,{{“Data”, Text.Trim, type text}, {“Source”, Text.Trim, type text}}),
Filtro = Table.SelectRows(Trim, each [Data] = “Sales” or ([Source] = “FC” and [Year] = Text.From(Año)) or ([Source] = “Plan” and [Data] = “Volume”)),
Replace = Table.ReplaceValue(Filtro,“Revenue”,“Sales”,Replacer.ReplaceText,{“Data”}),
Pivot = Table.Pivot(Replace, List.Distinct(Replace[Data]), “Data”, “Value”),
Sol = Table.Sort(Pivot,{{“Plant”, Order.Ascending}, {“Customer #”, Order.Ascending}, {“Product”, Order.Ascending}, {“Source”, Order.Ascending}, {“Year”, Order.Ascending}})
in
Sol

1 Like

Is it cheating to change the YEAR parameter to a number type?

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Split Column by Delimiter" = Table.SplitColumn(Source, "Attribute", Splitter.SplitTextByDelimiter("#(cr)", QuoteStyle.Csv), {"Attribute", "Year"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Plant", type text}, {"Customer #", Int64.Type}, {"Product", type text}, {"Attribute", type text}, {"Year", Int64.Type}, {"Value", type number}}),
    #"Added Source" = Table.AddColumn(#"Changed Type", "Source", each if Text.Contains([Attribute], "FC") then "FC" else "Plan", type text),
    #"Added Metric" = Table.AddColumn(#"Added Source", "Metric", each if Text.Contains([Attribute], "Volume") then "Volume" else "Sales", type text),
    #"Removed Columns" = Table.RemoveColumns(#"Added Metric",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Metric]), "Metric", "Value"),
    #"Filtered Year" = Table.SelectRows(#"Pivoted Column", each ([Source] = "FC" and [Year] = YEAR) or ([Source] = "Plan" and List.Contains({YEAR..YEAR+4}, [Year]))),
    #"Sorted Rows" = Table.Sort(#"Filtered Year",{{"Plant", Order.Ascending}, {"Customer #", Order.Ascending}, {"Product", Order.Ascending}, {"Year", Order.Ascending}, {"Source", Order.Ascending}})
in
    #"Sorted Rows"

PQ Workout 006 - 6 Filter Relevant Lines - Alexis Olson.xlsx (65.0 KB)

3 Likes

Awesome work! @AlexisOlson , Your script has the right feel, is tight, improves readability and utility, and meets the business requirements of the effort (YEAR parameter to be used as anchor & for filtering (better than current automatic date)). I was building the same way, though I was going down the path of either hard coding a Year or filtering out specific years. Your work is as close to a most desirable M Code Pattern for this ask as possible. I suspect that @Matthias must have had this in mind when he was creating the Workout.

1 Like
let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    Split = Table.SplitColumn(Source, "Attribute", each if Text.Contains(_, "p.A") then 
        {"Plan",Text.BeforeDelimiter(_," "),Text.AfterDelimiter(_,"#(lf)")} else 
        List.Select(Text.SplitAny(_, " "&"#(cr)"&"#(lf)"),each _>"") , {"Source", "Type","Year"}),
    Replace = Table.ReplaceValue(Split,"Revenue","Sales",Replacer.ReplaceText,{"Type"}),
    Pivote = Table.Pivot(Replace, List.Distinct(Replace[Type]), "Type", "Value", List.Sum),
    Filter = Table.SelectRows(Pivote, each ([Source]="FC" and [Year] = YEAR )  or ([Source]<>"FC"  and  [Year]<= Text.From(Number.From(YEAR)+4) ))
in
    Filter
2 Likes
let
    Source = Data,
    add = Table.AddColumn(Source, "Personalizar", each Number.From(Text.Select([Attribute],{"0".."9"}))),
    fil2 = Table.SelectRows(add, each Text.Contains( [Attribute],"p.A" ) and ([Personalizar] >= Number.From(YEAR) and [Personalizar] <= Number.From(YEAR) +4)),
    col = Table.RemoveColumns(fil2,{"Personalizar"}),
    tab = Table.AddColumn(col, "Source", each [
    Source = if Text.Contains([Attribute],"FC") then "FC" else "Plan",
    Volume = if Source = "FC" and Text.Contains([Attribute],"FC Revenue") then "Sales" else if Source = "Plan" and Text.Contains([Attribute],"Sales") then "Sales" else "Volume",
    Year = Text.Select([Attribute],{"0".."9"})
]),
    exp = Table.ExpandRecordColumn(tab, "Source", Record.FieldNames(tab[Source]{0}) ),
    rem = Table.RemoveColumns(exp,{"Attribute"}),
    pv = Table.Pivot(rem, List.Distinct(rem[Volume]), "Volume", "Value"),
    fil1 = Table.SelectRows(Source, each Text.StartsWith( [Attribute],"FC" ) and Text.EndsWith ( [Attribute],YEAR )  ),
    tab1 = Table.AddColumn(fil1, "Source", each [
    Source = if Text.Contains([Attribute],"FC") then "FC" else "Plan",
    Volume = if Source = "FC" and Text.Contains([Attribute],"FC Revenue") then "Sales" else if Source = "Plan" and Text.Contains([Attribute],"Sales") then "Sales" else "Volume",
    Year = YEAR
]),
    exp1 = Table.ExpandRecordColumn(tab1, "Source", Record.FieldNames(tab[Source]{0}) ),
    rem2 = Table.RemoveColumns(exp1,{"Attribute"}),
    pv2 = pv & Table.Pivot(rem2, List.Distinct(rem2[Volume]), "Volume", "Value"),
    tipo = Table.TransformColumnTypes(pv2,{{"Sales", type number}, {"Volume", type number}}),
    class = Table.Sort(tipo,{{"Plant", Order.Ascending}, {"Customer #", Order.Ascending}, {"Product", Order.Ascending}, {"Source", Order.Ascending}, {"Year", Order.Ascending}}),
    res = Table.ReorderColumns(class,{"Plant", "Customer #", "Product", "Source", "Year", "Volume", "Sales"})
in
    res
2 Likes

@Matthias thanks for the workout. I used UI for my solution and used chat GPT to document and tidy up my mcode

Solution:

let
// Load data from Excel workbook
Source = Excel.CurrentWorkbook(){[Name=“Data”]}[Content],

// Add a column to identify the source of each row
AddedConditionalColumn = Table.AddColumn(Source, "Source", each if Text.Contains([Attribute], "FC") then "FC" else "Plan"),

// Extract the year from the "Attribute" column and create a new "Year" column
InsertedTextAfterDelimiter = Table.AddColumn(AddedConditionalColumn, "Year", each Text.AfterDelimiter([Attribute], "#(lf)"), type text),

// Add a column to identify whether each row represents volume or sales
AddedCustomColumn = Table.AddColumn(InsertedTextAfterDelimiter, "Custom", each if Text.Contains([Attribute], "Volume") then "Volume" else "Sales"),

// Remove the original "Attribute" column
RemovedAttributeColumn = Table.RemoveColumns(AddedCustomColumn,{"Attribute"}),

// Pivot the table to create separate columns for volume and sales
PivotedColumn = Table.Pivot(RemovedAttributeColumn, List.Distinct(RemovedAttributeColumn[Custom]), "Custom", "Value", List.Sum),

// Change the data type of the volume and sales columns to integers
ChangedColumnType = Table.TransformColumnTypes(PivotedColumn,{{"Volume", Int64.Type}, {"Sales", Int64.Type}}),

// Replace null values in the volume and sales columns with 0
ReplacedNullValues = Table.ReplaceValue(ChangedColumnType,null,0,Replacer.ReplaceValue,{"Volume", "Sales"}),

// Add a column to identify whether the row meets the condition of having a year greater than or equal to the global YEAR variable
AddedYesNoColumn = Table.AddColumn(ReplacedNullValues, "YesNo", each if [Year] >= YEAR then "Yes" else "No"),

// Filter the table to keep only the rows with a "Yes" value in the "YesNo" column
FilteredYesRows = Table.SelectRows(AddedYesNoColumn, each ([YesNo] = "Yes")),

// Remove the "YesNo" column
RemovedYesNoColumn = Table.RemoveColumns(FilteredYesRows,{"YesNo"})

in
// Output the final cleaned table
RemovedYesNoColumn

Summary

3 Likes

Hello,
thanks to everyone who participated! I loved the UI orientated approaches and of course also the super cool Split technique from @borydobon. I’ll show you two relatively easy solutions: One relying on added columns plus one adapting the Split technique.

This workout combined two common real world problems. Filtering beyond standard filtering and splitting a column with changing order, inconsistent naming and special characters.

I’ll start with the added column orientated solution, you should be able to follow along even if you are not an advanced user. Changing the YEAR parameter to a number type as @AlexisOlson asked for, is definitely ok, but I’ll do without:

On the Add Column tab select Custom Column and then type Source for the column name and write a simple formula:
image
if Text.StartsWith([Attribute], “FC”) then “FC” else “Plan”

That’s really easy so we do this twice more.
Once with the column name Type and this formula:
if Text.Contains([Attribute], “Volume”) then “Volume” else “Sales”

And once more with the column name Year and a formula to extract the last 4 characters:
Text.End([Attribute], 4)

Now simply mark the Attribute column and press Delete to get rid of it.

Mark the Source column and filter for Text Filter
image
Choose Advanced and for Source equals FC and for Year equals Parameter YEAR

You will see this in the formula bar
= Table.SelectRows(#“Removed Columns”, each [Source] = “FC” and [Year] = YEAR)
Put a bracket around the condition and then copy it
= Table.SelectRows(#“Removed Columns”, each ([Source] = “FC” and [Year] = YEAR) )
Type or and paste and adapt:
= Table.SelectRows(#“Removed Columns”, each ([Source] = “FC”) and ([Year] = YEAR) or ([Source] <> “FC”) and ([Year] < Text.From(Number.From(YEAR)+5)) )
The Text.From(Number.From( allows you to work with YEAR without changing any type.

[If this filter condition building in the formula bar is difficult for you, you can also build it in an added custom column.]

Now let’s pivot the Type column. Mark it and on the Transform column select Pivot Column.
image
As Values Column choose Value

Finally it makes sense before loading the data to define the data type for the new columns Volume and Sales. Mark them and on the Transform column select Detect Data Type.
image
You’ll get whole number and type number.

That’s it. Completely doable with UI support and some tweaking.

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Added Source" = Table.AddColumn(Source, "Source", each if Text.StartsWith([Attribute], "FC") then "FC" else "Plan"),
    #"Added Type" = Table.AddColumn(#"Added Source", "Type", each if Text.Contains([Attribute], "Volume") then "Volume" else "Sales"),
    #"Added Year" = Table.AddColumn(#"Added Type", "Year", each Text.End([Attribute], 4)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Year",{"Attribute"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Source] = "FC") and ([Year] = YEAR) or ([Source] <> "FC") and ([Year] < Text.From(Number.From(YEAR)+5))),
    #"Pivoted Column" = Table.Pivot(#"Filtered Rows", List.Distinct(#"Filtered Rows"[Type]), "Type", "Value", List.Sum),
    #"Changed Type" = Table.TransformColumnTypes(#"Pivoted Column",{{"Volume", Int64.Type}, {"Sales", type number}})
in
    #"Changed Type"

That was easy and step-by-step, but you want to have it shorter? Then try the splitting approach from @borydobon! Too difficult? Here is an alternative logic which might be easier to understand (saves one step):

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    Split = Table.SplitColumn(Source, "Attribute", each 
        if Text.Contains(_, "Plan") then 
        {"Plan",Text.BeforeDelimiter(_," "),Text.End(_, 4)} else 
        if Text.Contains(_, "Volume") then
        {"FC","Volume",Text.End(_, 4)} else
        {"FC","Sales",Text.End(_, 4)}, {"Source", "Type","Year"}),
    Filter = Table.SelectRows(Split, each ([Source] = "FC") and ([Year] = YEAR) or ([Source] <> "FC") and ([Year] < Text.From(Number.From(YEAR)+5)) ),
    Pivot = Table.Pivot(Filter, List.Distinct(Filter[Type]), "Type", "Value", List.Sum)
in
    Pivot

Try them out and make sure to check also some of the other approaches above! => Hope you learn something!

Big thanks to everyone who has blurred or hidden the details! I have taken this now out so that it’s easier to compare solutions.

3 Likes

Too late to contest with you, nevertheless posting my

approach (in Power BI)
let
  Data_Table = Excel.Workbook(
    File.Contents("C:\Users\pavel\Downloads\6 Filter Relevant Lines.xlsx"), 
    null, 
    true
  ){[Item = "Data", Kind = "Table"]}[Data], 
  // Split Attribute column to Operation Type and Year.
  #"SplitTo<Operation Type, Year>" = Table.SplitColumn(
    Data_Table, 
    "Attribute", 
    Splitter.SplitTextByDelimiter("#(cr)", QuoteStyle.Csv), 
    {"Operation Type", "Year"}
  ), 
  // Define data types (for larger data files, define Year column data type only, rest of the columns are to be defined after filtering to speed up PQ)
  ChangedDataTypes = Table.TransformColumnTypes(
    #"SplitTo<Operation Type, Year>", 
    {
      {"Operation Type", type text}, 
      {"Year", Int64.Type}, 
      {"Value", type number}, 
      {"Product", type text}, 
      {"Customer #", Int64.Type}, 
      {"Plant", type text}
    }
  ), 
  // Filter as close to data source as possible to the year user selected by PQ parameter.
  #"Year<SelectedYear, SelectedYear+4>" = Table.SelectRows(
    ChangedDataTypes, 
    each [Year] >= SelectedYEAR and [Year] <= SelectedYEAR + 4
  ), 
  // Disinguis FC and Plan data
  AddSourceColumn = Table.AddColumn(
    #"Year<SelectedYear, SelectedYear+4>", 
    "Source", 
    each if Text.StartsWith([Operation Type], "FC") then "FC" else "Plan", 
    type text
  ), 
  // Distinguish Volume and Sales
  #"Add<Volume, Sales>" = Table.TransformColumns(
    AddSourceColumn, 
    {"Operation Type", each if Text.Contains(_, "Volume") then "Volume" else "Sales", type text}
  ), 
  // Better human understanding of data (unnecessary for Power BI)
  ReorderedColumns = Table.ReorderColumns(
    #"Add<Volume, Sales>", 
    {"Plant", "Customer #", "Product", "Source", "Year", "Operation Type", "Value"}
  ), 
  // Transfer values to Volume and Sales column, respectively.
  #"PivotTo<Volume, Sales>" = Table.Pivot(
    ReorderedColumns, 
    List.Distinct(ReorderedColumns[#"Operation Type"]), 
    "Operation Type", 
    "Value"
  )
in
  #"PivotTo<Volume, Sales>"

Here is my submission. This was so tempting for me but I think I got it right sha

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Plant", type text}, {"Customer #", Int64.Type}, {"Product", type text}, {"Attribute", type text}, {"Value", type number}}),
    #"Split Column by Position" = Table.SplitColumn(#"Changed Type", "Attribute", Splitter.SplitTextByPositions({0, 4}, true), {"Attribute.1", "Attribute.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Position",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Source", each if Text.Contains([Attribute.1],"FC") then "FC" else "Plan", type text),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom", each if Text.Contains([Attribute.1], "Volume") then "Volume" else "Sales", type text),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Attribute.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute.2", "Year"}}),
    #"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Custom]), "Custom", "Value", List.Sum),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([Year] = "2024"))
in
    #"Filtered Rows"```