在单个查询中从DB2中的表中删除重复的行

时间:2012-04-10 11:05:51

标签: sql db2 sql-delete

我有一个包含3列的表格如下:

one   |   two    |  three  |   name
------------------------------------
 A1       B1          C1        xyz
 A1       B1          C1        pqr      -> should be deleted
 A1       B1          C1        lmn      -> should be deleted
 A2       B2          C2        abc
 A2       B2          C2        def      -> should be deleted
 A3       B3          C3        ghi
------------------------------------ 

该表没有任何主键列。我对表没有任何控制权,因此我无法添加任何主键列。

如图所示,我想删除一列,两列和三列组合相同的行。因此,如果A1B1C1发生三次(如上所述),则应删除其他两个,并且只留下一个。

如何通过DB2中的一个查询实现此目的?

我的要求是单个查询,因为我将通过java程序运行它。

7 个答案:

答案 0 :(得分:20)

(这假设您使用的是DB2 for Linux / Unix / Windows,其他平台可能略有不同)

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

应该得到你想要的东西。

该查询使用OLAP function ROWNUMBER()为每个ONETWOTHREE组合中的每一行分配一个数字。然后,DB2可以将fullselect(A)引用的行与DELETE statement应从表中删除的行进行匹配。为了能够使用fullselect作为删除子句的目标,它必须匹配deletable view的规则(请参阅注释部分下的“删除视图”)。

以下是一些证明(在LUW 9.7上测试):

DECLARE GLOBAL TEMPORARY TABLE SESSION.TEST (
    one CHAR(2),
    two CHAR(2),
    three CHAR(2),
    name CHAR(3)
) ON COMMIT PRESERVE ROWS;

INSERT INTO SESSION.TEST VALUES 
    ('A1', 'B1', 'C1', 'xyz'),
    ('A1', 'B1', 'C1', 'pqr'),
    ('A1', 'B1', 'C1', 'lmn'),
    ('A2', 'B2', 'C2', 'abc'),
    ('A2', 'B2', 'C2', 'def'),
    ('A3', 'B3', 'C3', 'ghi');

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

SELECT * FROM SESSION.TEST;

2017年3月2日编辑:

在回答Ahmed Anwar提出的问题时,如果您需要捕获已删除的内容,您还可以将删除与“data change statement”结合使用。在此示例中,您可以执行以下操作,它将为您提供“ rn ”列,一个两个 3

SELECT * FROM OLD TABLE (
    DELETE FROM
        (SELECT 
             ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
            ,ONE
            ,TWO
            ,THREE
         FROM SESSION.TEST) AS A
    WHERE RN > 1
) OLD;

答案 1 :(得分:1)

DELETE FROM the_table tt
WHERE EXISTS ( SELECT *
    FROM the_table ex
    WHERE ex.one = tt.one
    AND ex.two = tt.two
    AND ex.three = tt.three
    AND ex.zname < tt.zname -- tie-breaker...
    );

注意:您的SQL语言可能会有所不同。注2:“name”是某些平台上的保留字。最好避免它。

答案 2 :(得分:1)

@a_horse_with_no_name的变体为不使用group by子句和in子句的iseries回答db2。它确实有效

DELETE from the_table a 
where rrn(a) < (
select max(rrn(a)) from the_table b 
where a.one = b.one and a.two = b.two and a.three = b.three
)

答案 3 :(得分:0)

Please take backup of table before deleting the data

Delete from table where Name in (select name from table
group by one,two,three
having count(*) > 2)

您可以使用

     DELETE from TABLE Group by one,two,three Having count(*) > 2; 

答案 4 :(得分:0)

DELETE  FROM Table_Name
WHERE   Table_Name_ID NOT IN ( SELECT  MAX(Table_Name_ID)
                                    FROM    Table_Name
                                    GROUP BY one ,
                                             two, 
                                             three )

一两三个是您重复的列,Table_Name ID是PK

答案 5 :(得分:0)

这是levenlevi的答案的变体,不需要桌面上的主键(现在不能测试语法)

DELETE FROM the_table
WHERE  rid_bit(the_table) NOT IN (SELECT MAX(rid_bit(the_table))
                                  FROM the_table
                                  GROUP BY one,two,three)

我认为在iSeries上不支持rid_bit(),但rrn()会保留相同的目的

答案 6 :(得分:0)

对于使用非常旧版本的db2 SQL的其他人:这些帖子的组合有助于识别和删除两次发布的两个批次的重复。

SELECT   * FROM     LIBRARY.TABLE a
WHERE    a.batch in (115131, 115287)
AND      EXISTS ( SELECT 1 from LIBRARY.TABLE d 
    WHERE d.batch in (115131, 115287)
     AND a.one = d.one AND a.two = d.two AND a.three = d.three 
    GROUP BY d.one, d.two, d.three 
    HAVING count(*) <> 1 )

    AND RRN(a) > (SELECT MIN(RRN(b)) FROM LIBRARY.TABLE b 
        WHERE b.batch in (115131, 115287)
        AND a.one = b.one AND a.two = b.two AND a.three = b.three );