仅在满足条件时删除重复项

时间:2021-06-11 19:58:58

标签: sql duplicates ssms

关于重复的最后一个问题。我了解如何使用 COUNT(*) 和 HAVING 子句 > 1 选择重复记录,但我面临着在满足条件时删除重复项的挑战。

昨天我询问了其中的一部分,在账单金额取消时删除重复项,但现在我必须包含一个标准,其中当账单金额具有相同的正负值时,日期是两者和代码都一样。

例如,记录 1 的帐单金额为 250 美元,代码为“JUN”,日期为 03/02/2020,记录 2 的帐单金额为 250 美元,代码为“PII”,日期为 03/07 /2020 和记录 3 的帐单金额为 -$250,代码为“PII”,日期为 03/07/2020。我想在这个例子中看到的结果只是记录 1,其中记录 2 和 3 将被视为根据我所述的条件重复。

表创建:

CREATE TABLE Billing (
    BillId varchar(10),
    SerialNo varchar(10),
    BillAmt MONEY,
    Code varchar(5),
    DispenseDt DATE
);

数据输入:

INSERT INTO Billing (BillId, SerialNo, BillAmt, Code, DispenseDt)
VALUES ('BL_001','aaa-111',250,'AAP','20200503')
      ,('BL_002','aab-112',250,'ADD','20200309')
      ,('BL_003','aab-112',-250,'ADD','20200309')
      ,('BL_004','aba-120',700,'YED','20200503')
      ,('BL_005','aba-120',370,'TPP','20200822')
      ,('BL_006','aba-120',370,'TPP','20201003')
      ,('BL_007','aba-120',400,'TPP','20200822')
      ,('BL_008','aba-120',-370,'TPP','20200822')
      ,('BL_009','aba-120',-700,'YED','20200503')
      ,('BL_010','baa-201',1000,'TOK','20200927')
      ,('BL_011','baa-201',-1000,'TOK','20200927')
      ,('BL_012','bab-210',1000,'TOK','20200927');

样本数据:

+----------+-----------+---------+------+------------+
| BillId  | SerialNo  | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001   | aaa-111   | $250    | AAP  | 20200503   |
| BL_002   | aab-112   | $250    | ADD  | 20200309   |
| BL_003   | aab-112   |-$250    | ADD  | 20200309   |
| BL_004   | aba-120   | $700    | YED  | 20200503   |
| BL_005   | aba-120   | $370    | TPP  | 20200822   |
| BL_006   | aba-120   | $370    | TPP  | 20201003   |
| BL_007   | aba-120   | $400    | TPP  | 20200822   |
| BL_008   | aba-120   |-$370    | TPP  | 20200822   |
| BL_009   | aba-120   |-$700    | YED  | 20200503   |
| BL_010   | baa-201   | $1000   | TOK  | 20200927   |
| BL_011   | baa-201   |-$1000   | TOK  | 20200927   |
| BL_012   | bab-210   | $1000   | TOK  | 20200927   |
+----------+-----------+---------+------+------------+

期望结果:

+----------+-----------+---------+------+------------+
| BillId  | SerialNo  | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001   | aaa-111   | $250    | AAP  | 20200503   |
| BL_006   | aba-120   | $370    | TPP  | 20201003   |
| BL_007   | aba-120   | $400    | TPP  | 20200822   |
| BL_012   | bab-210   | $1000   | TOK  | 20200927   |
+----------+-----------+---------+------+------------+

我的代码:

select a.SerialNo, a.BillAmt, a.Code, a.DispenseDt
from (
    select *,
      count(SerialNo) over(partition by SerialNo, DispenseDt) b
    from Billing ) a
where b = 1
AND
    InvoiceDt >= '20200601' And InvoiceDt <= '20200630'
    AND
    FacID IN ('IND600','IND605','IND610','IND620','IND630','IND640','IND650','IND660','IND670','IND680','IND690','IND695')
ORDER BY a.Serial;

2 个答案:

答案 0 :(得分:0)

我试图解决它,但我自己有点卡住了。这里的逻辑是获取排名,然后过滤相同的排名,但不知何故我的代码创建了排名[使用 rank() 和 row_number() 创建了其中的 2 个],这将删除您需要作为输出的一些情况,如果有人否则可以编辑此代码吗?那就太好了

小提琴链接: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e0c990d3694ad99b628b3e05a5de624f

select 

Bill_ID,
Code,
DispenseDt,
new_bill_amt,
rank()
over(partition by new_bill_amt,DispenseDt, code) as rank_,
row_number()
over(partition by new_bill_amt,DispenseDt, code) as rank_2

from (
select
*,
replace(billamt,'-','') as new_bill_amt
from Billing
) as f

答案 1 :(得分:0)

我认为这可能会奏效。

(我使用了 CTE,但您可以将其转换为子查询。)

WITH base_cte AS (
    SELECT 
        B1.SerialNo
    ,   SUM(B1.BillAmt) AS [TotAmt]
    ,   B1.Code
    ,   B1.DispenseDt
    FROM #Billing AS B1
    GROUP BY 
        B1.SerialNo
    ,   B1.Code
    ,   B1.DispenseDt
)
SELECT 
    B.BillId
,   B.SerialNo
,   B.BillAmt
,   B.code
,   B.DispenseDt
FROM #Billing AS B
LEFT JOIN base_cte AS X ON X.SerialNo = B.SerialNo
WHERE X.TotAmt = B.BillAmt 
AND X.DispenseDt = B.DispenseDt

输出:

BillId  SerialNo  BillAmt   code    DispenseDt
BL_001  aaa-111   250.00    AAP     2020-05-03
BL_006  aba-120   370.00    TPP     2020-10-03
BL_007  aba-120   400.00    TPP     2020-08-22
BL_012  bab-210   1000.00   TOK     2020-09-27

编辑:这是 OVER() 的不同方法。

SELECT 
    Y.BillId
,   Y.SerialNo
,   Y.BillAmt
,   Y.Code
,   Y.DispenseDt
FROM (
    SELECT X.*
    ,   [Ct] = COUNT(*) OVER(PARTITION BY X.code, X.TotAmt, X.DispenseDt ORDER BY X.SerialNo, X.code, X.DispenseDt)
    FROM (
        SELECT 
            B.BillId
        ,   B.SerialNo
        ,   B.BillAmt
        ,   B.code
        ,   B.DispenseDt
        ,   [TotAmt] = SUM(B.BillAmt) OVER(PARTITION BY B.SerialNo, B.code, B.DispenseDt ORDER BY B.SerialNo, B.code, B.DispenseDt)
        FROM #Billing AS B
    ) AS X
) AS Y
WHERE Y.BillAmt = Y.TotAmt
ORDER BY Y.BillId
相关问题