SQL Server中的重复记录

时间:2017-09-28 13:45:33

标签: sql-server tsql

SQL Fiddle

我有下表

CREATE TABLE __EpiTest
(
    `ActivityRecordID` int, 
    `ActCstID` varchar(6), 
    `ResCstID` varchar(6), 
    `VolAmt` int, 
    `ActCnt` int, 
    `TotOCst` int, 
    `TotCst` int
);

INSERT INTO __EpiTest (`ActivityRecordID`, `ActCstID`, `ResCstID`, `VolAmt`, `ActCnt`, `TotOCst`, `TotCst`)
VALUES (15652, 'DIM008', 'CPF005', 30.455249786377, 1, 0, 0.375024198767061),
       (15652, 'DIM008', 'CSC004', 30.455249786377, 1, 7.62176510799961, 11.932578069479),
       (15652, 'DIM008', 'REC001', 30.455249786377, 1, 0.17902367836393, 0.384881520159455),
       (15652, 'OUT001', 'CPF002', 15, 0, 0, 16.9408193013078),
       (15652, 'OUT001', 'CSC001', 15, 0, 2.36971564207042,  2.36971564207042),
       (15652, 'OUT001', 'CSC004', 15, 0, 12.3230666021278, 12.3760690367354),
       (15652, 'OUT001', 'REC001', 15, 0, 0.377459387378349, 3.0275278374102),
       (15652, 'SUP001', 'CPF002', 1, 1, 0, 0.00108648359810756),
       (15652, 'SUP001', 'CPF011', 1, 1, 0, -1.89799880202357E-14),
       (15652, 'SUP001', 'CPF020', 1, 1, 0, 1.31058251625567E-05),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CPF021', 1, 1, 0, 25.0942308512551),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC001', 1, 1, 1.9628769103451, 1.9628769103451),
       (15652, 'SUP001', 'CSC002', 1, 1, 0, 10.2266625467779),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'CSC004', 1, 1, 16.3451721608005, 16.3513319060046),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC001', 1, 1, 0.254410386701976, -6.27048795659376),
       (15652, 'SUP001', 'REC002', 1, 1, 0, 1.10781732547441);

请注意,有些行具有[ActivityRecordID][ActCstID][ResCstID]的匹配值。我想合并这些值并对[TotOCst][TotCst]中的值求和。为此,我尝试使用MERGE

MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst 
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID]) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID])  
WHEN MATCHED THEN 
    UPDATE 
    SET [TotOCst] = Src.[TotOCst], 
        [TotCst] = Src.[TotCst] 
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

这会匹配并正确更新[TotOCst][TotCst]中每个副本的值,但它会在表格中留下重复的行,而我希望删除除了一个以外的所有行。我怎样才能做到这一点?

注意,目标表是巨大的,所以我想通过使用上面MERGE查询的变体或其他替代方法尝试通过单个操作执行此操作。多个查询对我来说太贵了......

插图

我得到了

...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...

但我想要

...
15652   SUP001  CPF021  1   1   0                   12.5471154256275
15652   SUP001  CSC001  1   1   0.98143845517255    0.98143845517255 
15652   SUP001  CSC002  1   1   0                   10.2266625467779
15652   SUP001  CSC004  1   1   8.17258608040024    8.17566595300228
15652   SUP001  REC001  1   1   0.127205193350988   -3.13524397829688
...

2 个答案:

答案 0 :(得分:4)

为了在MERGE语句中使用它,您需要对行更具体一些。我已经调整了下面的查询:

MERGE [__EpiTest] AS Tgt 
USING (
    SELECT [ActivityRecordID], 
           [ActCstID], 
           [ResCstID], 
           SUM([TotOCst]) AS TotOCst, 
           SUM([TotCst]) AS TotCst ,
           VolAmt,
           ActCnt
    FROM [__EpiTest]  
    GROUP BY [ActivityRecordID], 
             [ActCstID], 
             [ResCstID], 
             VolAmt,
             ActCnt) AS Src 
ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
     Tgt.[ActCstID] = Src.[ActCstID] AND 
     Tgt.[ResCstID] = Src.[ResCstID] AND
     Tgt.TotOCst = Src.TotOCst AND
     Tgt.TotCst = Src.TotCst
     ) 
WHEN NOT MATCHED BY TARGET THEN
    INSERT ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
    VALUES ( ActivityRecordID, ActCstID, ResCstID, TotOCst, TotCst, VolAmt, ActCnt )
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;
GO

基本上,我已对其进行了更改,以便合并源包含其他似乎不变的列,以便以后插入它们。我已将MATCH条件基本上改为"行完全相同并且将保持不变"并删除了WHEN MATCHED语句。

然后我添加了一个INSERT语句,以便对于那些已经改变的(将由NOT MATCHED BY TARGET删除它们的行),将插入一个带有这些值的新行。

在相同的数据中,这将返回以下结果集:

ActivityRecordID    ActCstID    ResCstID    VolAmt  ActCnt  TotOCst TotCst
15652   DIM008  CPF005  30.455249786377 1   0   0.375024198767061
15652   DIM008  CSC004  30.455249786377 1   7.62176510799961    11.932578069479
15652   DIM008  REC001  30.455249786377 1   0.17902367836393    0.384881520159455
15652   OUT001  CPF002  15  0   0   16.9408193013078
15652   OUT001  CSC001  15  0   2.36971564207042    2.36971564207042
15652   OUT001  CSC004  15  0   12.3230666021278    12.3760690367354
15652   OUT001  REC001  15  0   0.377459387378349   3.0275278374102
15652   SUP001  CPF002  1   1   0   0.00108648359810756
15652   SUP001  CPF011  1   1   0   -1.89799880202357E-14
15652   SUP001  CPF020  1   1   0   1.31058251625567E-05
15652   SUP001  CSC002  1   1   0   10.2266625467779
15652   SUP001  REC002  1   1   0   1.10781732547441
15652   SUP001  CPF021  1   1   0   50.1884617025102
15652   SUP001  CSC001  1   1   3.9257538206902 3.9257538206902
15652   SUP001  CSC004  1   1   32.690344321601 32.7026638120092
15652   SUP001  REC001  1   1   0.508820773403952   -12.5409759131875

答案 1 :(得分:1)

__EpiTest中有重复值

SELECT * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001'
order by ActivityRecordID ,  ActCstID , ResCstID;
ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

简单选择单个值

  SELECT DISTINCT * FROM @__EpiTest 
    where ResCstID = 'REC001' and ActCstID ='SUP001'
    order by ActivityRecordID ,  ActCstID , ResCstID;
    ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
    ---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

删除

删除重复的值并维护frist

;WITH cte as (
SELECT Row_number() OVER (PARTITION BY ActivityRecordID ,  ActCstID , ResCstID ORDER BY (SELECT NULL)) Rn, * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001'
)
Delete from cte where Rn > 1 

SELECT * FROM @__EpiTest 
where ResCstID = 'REC001' and ActCstID ='SUP001' 
ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

使用合并 - 请注意,合并附件存在严重的性能问题

  ;WITH myResult as (
    SELECT Row_number() OVER (PARTITION BY ActivityRecordID ,  ActCstID , ResCstID ORDER BY (SELECT NULL)) Rn, * FROM @__EpiTest 
    )
    MERGE myResult AS Tgt 
    USING myResult AS Src 
    ON (Tgt.[ActivityRecordID] = Src.[ActivityRecordID] AND 
         Tgt.[ActCstID] = Src.[ActCstID] AND 
         Tgt.[ResCstID] = Src.[ResCstID] AND 
         Tgt.Rn = Src.Rn AND
         Src.Rn = 1)  
    WHEN MATCHED THEN 
        UPDATE 
        SET [TotOCst] = Src.[TotOCst], 
            [TotCst] = Src.[TotCst] 
    WHEN NOT MATCHED BY SOURCE THEN 
        DELETE;
     SELECT * FROM @__EpiTest 
    where ResCstID = 'REC001' and ActCstID ='SUP001' 
Rn                   ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
-------------------- ---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
1                    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566
2                    15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566

(2 row(s) affected)

(20 row(s) affected)

ActivityRecordID ActCstID ResCstID VolAmt      ActCnt      TotOCst                                 TotCst
---------------- -------- -------- ----------- ----------- --------------------------------------- ---------------------------------------
15652            SUP001   REC001   1           1           0.2544103867                            -6.2704879566