SQL查询与雪花不同和分组

时间:2020-04-27 23:17:12

标签: sql group-by distinct snowflake-cloud-data-platform

我要完成的工作是获取给定MPN的所有记录,但是,我只想要DeliveryDate中的最新shpm,但考虑到MAX函数需要在group by子句中,它不会获取最新记录,由于截然不同的DeliveryDate它会获取所有记录,它将获取两个记录而不是一个记录,我如何实现呢?这在雪花中。

这是我的SQL代码

SELECT
    MD.MPN,
    MD.LOTCODE,
    MD.DATECODE,
    SHIP.ITEMCODE AS SYSTEMPARTNUMBER, 
    SHIP.SERIALNUMBER AS SYSTEMSERIALNUMBER, 
    SHIP.CUSTOMERNAME, 
    SHIP.SHIPTOADDRESS AS ADDRESS,
    SUM(IFNULL(SHIP.QUANTITY,0)) AS QUANTITY,
    SHIP.DELIVERYDATE
FROM cunits UNITS
   JOIN unc UC ON UC.CHILDUNITID = UNITS.ID
   JOIN shpm SHIP ON SHIP.SERIALNUMBER = UC.SYSSN
   JOIN tsern SN ON SN.UNITID = UNITS.ID
   JOIN machined MD ON MD.SERIALNUMBER = SN.SERIALNUMBER     
WHERE --SYSTEMSERIALNUMBER = '001801055469' and 
MPN = 'XC0402A105KP5CNN-S'
GROUP BY MD.MPN,MD.LOTCODE,MD.DATECODE,SHIP.ITEMCODE,SHIP.SERIALNUMBER,SHIP.CUSTOMERNAME,SHIP.SHIPTOADDRESS

2 个答案:

答案 0 :(得分:2)

使用ROW_NUMBER()QUALIFY

SELECT MD.MPN, MD.LOTCODE, MD.DATECODE,
       SHIP.ITEMCODE AS SYSTEMPARTNUMBER, SHIP.SERIALNUMBER AS SYSTEMSERIALNUMBER, 
       SHIP.CUSTOMERNAME, SHIP.SHIPTOADDRESS AS ADDRESS,
       SUM(COALESCE(SHIP.QUANTITY, 0)) AS QUANTITY,
       SHIP.DELIVERYDATE
FROM cunits UNITS JOIN
     unc UC
     ON UC.CHILDUNITID = UNITS.ID JOIN
     shpm SHIP
     ON SHIP.SERIALNUMBER = UC.SYSSN JOIN
     tsern SN
     ON SN.UNITID = UNITS.ID JOIN
     machined MD
     ON MD.SERIALNUMBER = SN.SERIALNUMBER     
WHERE '001801055469' and MPN = 'XC0402A105KP5CNN-S'
GROUP BY MD.MPN, MD.LOTCODE, MD.DATECODE, SHIP.ITEMCODE, SHIP.SERIALNUMBER, SHIP.CUSTOMERNAME, SHIP.SHIPTOADDRESS
QUALIFY ROW_NUMBER() OVER (PARTITION BY MD.MPN, SHIP.SERIALNUMBER ORDER BY SHIP.SHIPDATE DESC) = 1;

这将返回每MPN行,这就是我对您的问题的解释方式。您可能还需要PARTITION BY中的其他列。

答案 1 :(得分:2)

所以猜测一些数据以匹配SQL

WITH cunits AS (
    SELECT * from values (1) v(id)
), unc AS (
    SELECT * FROM VALUES (1,'123') v(CHILDUNITID,SYSSN)
), shpm AS (
    SELECT * FROM VALUES ('a', '123', 10, '2020-02-01'),
       ('a', '123', 20, '2020-01-01') 
   v(ITEMCODE, SERIALNUMBER, QUANTITY, DELIVERYDATE)
), tsern AS (
    SELECT * FROM VALUES (1,'zxc') v(UNITID,SERIALNUMBER)
), machined as (
    SELECT * FROM VALUES ('zxc', 'XC0402A105KP5CNN-S') v(SERIALNUMBER, MPN)
)

并删除示例中无关紧要的列

SELECT
    MD.MPN,
    SHIP.ITEMCODE AS SYSTEMPARTNUMBER, 
    SHIP.SERIALNUMBER AS SYSTEMSERIALNUMBER, 
    SUM(IFNULL(SHIP.QUANTITY,0)) AS QUANTITY,
    SHIP.DELIVERYDATE
FROM cunits UNITS
   JOIN unc UC ON UC.CHILDUNITID = UNITS.ID
   JOIN shpm SHIP ON SHIP.SERIALNUMBER = UC.SYSSN
   JOIN tsern SN ON SN.UNITID = UNITS.ID
   JOIN machined MD ON MD.SERIALNUMBER = SN.SERIALNUMBER     
WHERE 
MPN = 'XC0402A105KP5CNN-S'
GROUP BY MD.MPN,SHIP.ITEMCODE,SHIP.SERIALNUMBER;

现在必须将SHIP.DELIVERYDATE添加到group by子句中,否则该代码将永远无法运行,甚至会忽略您不希望看到2020-01-01数据的愿望

添加后,您会得到两行不需要的行。

MPN SYSTEMPARTNUMBER    SYSTEMSERIALNUMBER  QUANTITY    DELIVERYDATE
XC0402A105KP5CNN-S  a   123 10  2020-02-01
XC0402A105KP5CNN-S  a   123 20  2020-01-01

戈登的解决方案,以添加质量

QUALIFY ROW_NUMBER() OVER (PARTITION BY MD.MPN, SHIP.SERIALNUMBER ORDER BY SHIP.DELIVERYDATE DESC) = 1;

正确给出了答案,但是计算所有结果并修剪掉不需要的结果..根据数据集大小和shpm表中的行数,要过滤的CTE可能会工作得更好..

WITH cunits AS (
    SELECT * from values (1) v(id)
), unc AS (
    SELECT * FROM VALUES (1,'123') v(CHILDUNITID,SYSSN)
), shpm AS (
    SELECT * FROM VALUES ('a', '123', 10, '2020-02-01'),
       ('a', '123', 20, '2020-01-01') 
   v(ITEMCODE, SERIALNUMBER, QUANTITY, DELIVERYDATE)
), tsern AS (
    SELECT * FROM VALUES (1,'zxc') v(UNITID,SERIALNUMBER)
), machined as (
    SELECT * FROM VALUES ('zxc', 'XC0402A105KP5CNN-S') v(SERIALNUMBER, MPN)
), pre_filtered_shpm AS (
    select * from shpm
    QUALIFY ROW_NUMBER() OVER (PARTITION BY SERIALNUMBER ORDER BY DELIVERYDATE DESC) = 1
)
SELECT
    MD.MPN,
    SHIP.ITEMCODE AS SYSTEMPARTNUMBER, 
    SHIP.SERIALNUMBER AS SYSTEMSERIALNUMBER, 
    SUM(IFNULL(SHIP.QUANTITY,0)) AS QUANTITY,
    SHIP.DELIVERYDATE
FROM cunits UNITS
   JOIN unc UC ON UC.CHILDUNITID = UNITS.ID
   JOIN pre_filtered_shpm SHIP ON SHIP.SERIALNUMBER = UC.SYSSN
   JOIN tsern SN ON SN.UNITID = UNITS.ID
   JOIN machined MD ON MD.SERIALNUMBER = SN.SERIALNUMBER     
WHERE 
MPN = 'XC0402A105KP5CNN-S'
GROUP BY MD.MPN,SHIP.ITEMCODE,SHIP.SERIALNUMBER,SHIP.DELIVERYDATE;