Native Query Question - SQL

Whitewater100 · August 16, 2021, 8:31pm

EDNA 5 Bel - snapshot.pptx (187.7 KB)
SalesOrders 8-16 pre-history fix eg five below fob.docx (14.0 KB)
*** READ FIRST ***
Dear Forum:
I have an issue with a native query I used to pull together three SQL tables in Power Query. What is occurring is I am getting duplicate values. I am seeking help to rewrite the SQL query which I have attached. My example is for Order 2005364 / Part ID = DP 7451.

I have tried deleting duplicate rows using different scenarios but image my Native SQL statement is where it should be corrected.

I have tried to upload the pbix but am having trouble due to file size but am attaching the data file. It also requires special credentials I am not able to share. I have attached both a word document with the code underlined and highlighted where I think the problem is as well a screenshot showing the results for this example. I’m not sure how to apply composite keys in my native query and do not have access to the dbase for changing it.

Any help on this query is greatly appreciated.

Thank you very much!
P.S. I can send data file if needed but it won’t upload due to size…

Your current work-in-progress PBIX file - VERY IMPORTANT
A clear explanation of the problem you are experiencing
A mockup of the results you want to achieve
Your underlying data file

Check out this thread on Tools and Techniques for Providing PBIX Files with Your Forum Questions

Also make sure that your data file contains no confidential information. If it does, click the link above.

*** DELETE THIS MESSAGE IF YOU ARE SURE ALL YOUR DETAILS ARE COMPLETE OR IF THE ABOVE INFORMATION IS NOT APPLICABLE TO YOUR QUESTION.***

BrianJ · August 16, 2021, 8:40pm

@Whitewater100 ,

Always great to see you back on the forum. Thanks for taking the time to so clearly explain the problem and try to provide all the necessary info – much appreciated. FYI – when you run into size limitations for uploading files, it’s perfectly okay to put those files on Google Drive, OneDrive or Dropbox and then just provide the public link.

I don’t know the answer to your question, but I am looping in the three SQL wizards on the EDNA Expert team - @Greg , @AntrikshSharma and @hafizsultan. I’m confident that one of them will be able to help you work this out.

– Brian

hafizsultan · August 16, 2021, 10:41pm

Hi @BrianJ ,

Thank you for tagging me on this.

@Whitewater100 ,

I understand that grain is not matching in your JOINs and that is why you are getting duplicates. You can only remove duplicates in SQL in this case to get accurate data. You are using below SQL:

SELECT col.CUST_ORDER_ID ,col.LINE_NO,col.LINE_STATUS ,col.PART_ID,col.DESIRED_SHIP_DATE ,col.ORDER_QTY,col.TOTAL_AMT_ORDERED,col.UNIT_PRICE,col.PRODUCT_CODE,col.LAST_SHIPPED_DATE,rl.AMOUNT,rl.INVOICE_ID,sl.SHIPPED_QTY,sl.USER_SHIPPED_QTY,sl.ACT_FREIGHT,sl.SHIPPING_UM,sl.TRADE_DISC_PERCENT FROM CUST_ORDER_LINE AS col

JOIN RECEIVABLE_LINE AS rl ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID

JOIN SHIPPER_LINE AS sl ONCol.CUST_ORDER_ID = sl.CUST_ORDER_ID

WHERE col.LAST_SHIPPED_DATE BETWEEN ‘2018-01-01’ AND ‘2023-12-31’ **AND rl.CUST_ORDER_LINE_NO = col.LINE_NOAND sl.CUST_ORDER_LINE_NO = col.LINE_NO

Try to replace it with below and see the results:

SELECT col.CUST_ORDER_ID ,col.LINE_NO,col.LINE_STATUS ,col.PART_ID,col.DESIRED_SHIP_DATE ,col.ORDER_QTY,col.TOTAL_AMT_ORDERED,col.UNIT_PRICE,col.PRODUCT_CODE,col.LAST_SHIPPED_DATE,rl.AMOUNT,rl.INVOICE_ID,sl.SHIPPED_QTY,sl.USER_SHIPPED_QTY,sl.ACT_FREIGHT,sl.SHIPPING_UM,sl.TRADE_DISC_PERCENT FROM CUST_ORDER_LINE AS col

JOIN RECEIVABLE_LINE AS rl ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID 31’ AND rl.CUST_ORDER_LINE_NO = col.LINE_NO

JOIN SHIPPER_LINE AS sl ONCol.CUST_ORDER_ID = sl.CUST_ORDER_ID AND sl.CUST_ORDER_LINE_NO = col.LINE_NO

WHERE col.LAST_SHIPPED_DATE BETWEEN ‘2018-01-01’ AND '2023-12-31’

Please let me know how it goes? If it does not solve the problem, we can dig further to understand issue and see where our grain is not matching.

Kind Regards,
Hafiz

Greg · August 16, 2021, 10:59pm

Hi @Whitewater100. @hafizsultan beat me to it. One of the first things I’d look into is the source of the duplicate rows: if the query @hafizsultan rewrote for you fixes the issue, great, but if not, unwanted duplicates are often a result of joining to a table with more than one matching row. After extracting and formatting your original SQL query as

SELECT 
	col.CUST_ORDER_ID,
	col.LINE_NO,
	col.LINE_STATUS,
	col.PART_ID,
	col.DESIRED_SHIP_DATE,
	col.ORDER_QTY,
	col.TOTAL_AMT_ORDERED,
	col.UNIT_PRICE,
	col.PRODUCT_CODE,
	col.LAST_SHIPPED_DATE,
	
	rl.AMOUNT,
	rl.INVOICE_ID,
	
	sl.SHIPPED_QTY,
	sl.USER_SHIPPED_QTY,
	sl.ACT_FREIGHT,
	sl.SHIPPING_UM,
	sl.TRADE_DISC_PERCENT
FROM 
	CUST_ORDER_LINE AS col
	JOIN RECEIVABLE_LINE AS rl
		ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID
	JOIN SHIPPER_LINE AS sl
		ON Col.CUST_ORDER_ID = sl.CUST_ORDER_ID
WHERE 
	col.LAST_SHIPPED_DATE BETWEEN '2018-01-01' AND '2023-12-31'
	AND rl.CUST_ORDER_LINE_NO = col.LINE_NO
	AND sl.CUST_ORDER_LINE_NO = col.LINE_NO

you can more easily see the joins to the RECEIVABLE_LINE and SHIPPER_LINE tables; if the main CUST_ORDER_LINE table doesn’t contain duplicates, I expect one or both of the JOIN tables contain more than 1 matching row.

Perhaps try to bring in each of the 3 tables in its own separate query into a PBIX file (filtering for only a few of those CUST_ORDER_ID of interest) and see if that perhaps shows where the data falls down. For me, if I was to go further, I’d try to run this query inside SQL Server Management Studio and see what it returns before even going to Power BI.

Hope this helps.
Greg

Whitewater100 · August 17, 2021, 2:46pm

Hello Hafiz Greg and Brian:

I wanted to follow up and show you what happened. I will attach the results. Essentially I still have the same issue and as you gentlemen pointed out there looks to be duplicate rows in one of the tables. Everything appears to be on the same grain. I used Hafiz SQL to produce this. Any further ideas on this is appreciated.

On my side I will double-check with the data’s owner on why they want all three tables. Currently
COL has order quantity
RL has Amount(Sales Amount)
SL has Shipped Quantity

Ill stay tuned and really appreciate
the attention you have placed here.

Best regards,

Bill

SELECT
col.CUST_ORDER_ID
,col.LINE_NO
,col.LINE_STATUS
,col.PART_ID
,col.ORDER_QTY
,col.TOTAL_AMT_ORDERED
,col.UNIT_PRICE
,col.PRODUCT_CODE
,col.LAST_SHIPPED_DATE
,col.DESIRED_SHIP_DATE
,rl.AMOUNT
,rl.INVOICE_ID
,sl.SHIPPED_QTY
,sl.USER_SHIPPED_QTY
,sl.ACT_FREIGHT
,sl.SHIPPING_UM
,sl.TRADE_DISC_PERCENT
FROM CUST_ORDER_LINE AS col
JOIN RECEIVABLE_LINE AS rl ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID AND rl.CUST_ORDER_LINE_NO = col.LINE_NO
JOIN SHIPPER_LINE AS sl ON Col.CUST_ORDER_ID = sl.CUST_ORDER_ID AND sl.CUST_ORDER_LINE_NO = col.LINE_NO
WHERE col.LAST_SHIPPED_DATE BETWEEN ‘2018-01-01’ AND ‘2023-12-31’

Greg · August 17, 2021, 3:07pm

Hi @Whitewater100. Perhaps the zero lines in the RECEIVABLE_LINE table can be filtered-out with another condition on the join, something like:

– eDNA Forum - SQL Native Query 2.sql

SELECT 
	col.CUST_ORDER_ID,
	col.LINE_NO,
	col.LINE_STATUS,
	col.PART_ID,
	col.DESIRED_SHIP_DATE,
	col.ORDER_QTY,
	col.TOTAL_AMT_ORDERED,
	col.UNIT_PRICE,
	col.PRODUCT_CODE,
	col.LAST_SHIPPED_DATE,
	rl.AMOUNT,
	rl.INVOICE_ID,
	sl.SHIPPED_QTY,
	sl.USER_SHIPPED_QTY,
	sl.ACT_FREIGHT,
	sl.SHIPPING_UM,
	sl.TRADE_DISC_PERCENT
FROM 
	CUST_ORDER_LINE AS col
	JOIN RECEIVABLE_LINE AS rl
		ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID
		AND rl.CUST_ORDER_LINE_NO = col.LINE_NO
		AND rl.AMOUNT > 0
	JOIN SHIPPER_LINE AS sl
		ON col.CUST_ORDER_ID = sl.CUST_ORDER_ID
		AND sl.CUST_ORDER_LINE_NO = col.LINE_NO
WHERE 
	col.LAST_SHIPPED_DATE BETWEEN '2018-01-01' AND '2023-12-31'

Hope it helps.
Greg

Whitewater100 · August 17, 2021, 4:38pm

Hi Greg:

Thanks again for your help. I ran it as sent and the same issue of duplicates is occurring. I’m trying it right now, using your approach with having AMOUNT <> 0 (you have AMOUNT > 0. I’ll let you know what happens.

Thanks for sticking with me thru this.

Whitewater100 · August 17, 2021, 4:57pm

Hi Greg:

The darn thin still won’t cooperate. If we could eliminate those 0 value lines it would be great!

Thanks for all of your effort.

Greg · August 17, 2021, 6:20pm

Hi @Whitewater100. So it may be time to raise the “SQL help” flag and get your DBA to run queries on the database and determine the correct syntax for the additional condition (rl.AMOUNT > 0 and rl.AMOUNT <> 0 aren’t correct, so …) (perhaps data type?).

Anyway, good luck.
Greg

jpratt · August 18, 2021, 12:56pm

Hi @Whitewater100! This is a very common issue with relational data, and there are a number of potential solutions that all involve some kind of subquery. For this case, I will recommend an “inline view”, which also has a benefit of speeding up subqueries. CTEs (common table expressions) are also a popular option, but can be a little harder in some cases to understand what is going on. TVFs (table valued functions) can also be used.

What you are doing in the SELECT statement is getting the total amount and invoice ID for the customer order and line number. The solution below incorporates @hafizsultan 's additional JOIN condition and effectively addresses @Greg 's idea of filtering out the 0s by including them in a SUM().

Keep in mind that this only works if you only have one invoice ID per order/line number.

Instead of this in the FROM clause:

FROM
CUST_ORDER_LINE AS col
JOIN RECEIVABLE_LINE AS rl
ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID

Use this (fyi, the display version is removing my indenting):

FROM CUST_ORDER_LINE AS col
JOIN ( SELECT rl_ilv.CUST_ORDER_ID,
rl_ilv.CUST_ORDER_LINE_NO,
rl_ilv.INVOICE_ID,
SUM(rl_ilv.AMOUNT) AS AMOUNT
FROM RECEIVABLE_LINE AS rl_ilv
GROUP BY rl_ilv.CUST_ORDER_ID, rl_ilv.CUST_ORDER_LINE_NO, rl_ilv.INVOICE_ID
) rl ON rl.CUST_ORDER_ID = col.CUST_ORDER_ID AND rl.CUST_ORDER_LINE_NO = col.LINE_NO

I left the alias for the inline view (rl) the same as your original RECEIVABLE_LINE alias so you don’t need to change the references in the SELECT statement.

Note that you can further speed things up (if needed) by applying a WHERE clause to the inline view.

Give that a shot and let us know how it goes. You can apply the same technique to the SHIPPER_LINE if needed.

EDIT: One more thing I should explicitly add: If there is more than one Invoice ID per customer order line, which you have in your situation, you CANNOT join on the RECEIVABLE_LINE at the Invoice ID level and remove the duplicate rows in the outer query. The granularity, or grain, of the lowest level of granularity dictates the granularity of all the rows returned. This is where inline views and other subqueries become so powerful because they allow you to change the granularity of the subordinate data being pulled in.

So, you have to make a choice: remove the Invoice ID from the query altogether or accept that the granularity of the whole dataset is at the Order/Line/Invoice level. Either way, the inline view allows you to control the subquery granularity and the data returned to the outer query.

John C. Pratt

EnterpriseDNA · August 20, 2021, 12:17am

Hi @Whitewater100, did the response provided by the users and experts help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark as solution the answer that solved your query. Thanks!

EnterpriseDNA · August 23, 2021, 1:28am

Hi @Whitewater100, we’ve noticed that no response has been received from you since August 18.

We just want to check if you still need further help with this post? In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved.

EnterpriseDNA · August 24, 2021, 12:27am

Hi @Whitewater100, due to inactivity, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box.