删除基于除

时间:2017-09-25 07:04:24

标签: sql teradata

我想从保留一个表的表中删除确切的重复记录。但是,我无法使用中间表方法,因为除了ID列之外,所有列都包含重复项。例如:

ID,
COL1,
Col2,
col3,
col4
The dups are on col1, col2, col3, col4

Below some samples:

ID  COL1 COL2  COL3 COL4
123 ABC  4RTFD  FGY  12346
234 ABC  4RTFD  FGY  12346
586 ABC  4RTFD  FGY  12346

这里只有Id列不同,其余四列是重复的。我想只保留最大ID列行。

我可以在这里使用什么方法?

谢谢, 阿米特

4 个答案:

答案 0 :(得分:2)

尝试在所有列上加入表格,并且ID不同......

    CREATE TABLE Dups
(
    ID int IDENTITY(1,1) PRIMARY KEY,
    Col1 int NOT NULL,
    Col2 date NOT NULL,
    Col3 char(1) NOT NULL,
    Col4 char(1) NOT NULL
 )
 INSERT dbo.Dups (Col1,Col2,Col3,Col4)
 VALUES ('1','20170925','A','Z'), ('1','20170925','A','Z'), ('1','20170925','A','Z'), ('2','20170925','A','Z'), ('2','20170925','A','Z'), ('2','20170925','A','Z'), ('3','20170925','A','Z');

 SELECT * FROM Dups;

 -- This solution to retain the first ID found that is duplicated...
 DELETE FROM Dups
 WHERE ID IN (
                SELECT ID
                FROM (
                        SELECT d1.ID,
                                row_number() OVER (ORDER BY d1.ID) AS DupSeq
                        FROM dbo.Dups AS d1
                        INNER JOIN dbo.Dups AS d2 ON d2.Col1 = d1.Col1 AND d2.Col2 = d1.Col2 AND d2.Col3 = d1.Col3 AND d2.Col4 = d1.Col4
                        WHERE d1.ID <> d2.ID
                    ) AS t
                WHERE DupSeq > 1
            );

 -- This solution to retain the last ID found that is duplicated...
 DELETE FROM Dups
 WHERE ID NOT IN (
                SELECT DISTINCT
                       max(t.ID) OVER(PARTITION BY t.Col1,t.Col2,t.Col3,t.Col4 ORDER BY WindowOrder) AS KeepID
                FROM (
                        SELECT d1.ID,
                                d1.Col1,
                                d1.Col2,
                                d1.Col3,
                                d1.Col4,
                                1 AS WindowOrder
                        FROM dbo.Dups AS d1
                        LEFT OUTER JOIN dbo.Dups AS d2 ON  d2.Col1 = d1.Col1 
                                                       AND d2.Col2 = d1.Col2 
                                                       AND d2.Col3 = d1.Col3 
                                                       AND d2.Col4 = d1.Col4
                                                       AND d1.ID <> d2.ID
                    ) AS t
            );


 SELECT * FROM Dups;

DROP TABLE dbo.Dups

您需要在第一个解决方案中使用row_number(),因为ID1将与ID3匹配,因此ID3也将匹配ID1。

在第二个解决方案中,连接是LEFT OUTER以保留那些不重复的值。

答案 1 :(得分:0)

你可以做到,其他许多人以前在SQL-Server(和Teradata)做过的事情,请看这里How to delete duplicate rows in sql server?,或者即使没有像CTE这样的人也能做到这一点

DELETE FROM (
  SELECT ROW_NUMBER()
  OVER (PARTITION BY col1,col1,col3,col4
        ORDER BY ID DESC) rn
  FROM tbl  -- tbl is "your" table ...
) t1 WHERE rn>1

它适用于SQL,还没有在teradata上测试过它,但是,由于ROW_NUMBER()存在,我也希望它可以工作......

答案 2 :(得分:0)

您可以使用correlated subquerymax功能获得所需的结果,如下所示。

DELETE
FROM table1 t1
WHERE t1.Id <> (
        SELECT max(t2.Id)
        FROM table1 t2
        WHERE t1.col1 = t2.col1
            AND t1.col2 = t2.col2
            AND t1.col3 = t2.col3
            AND t1.col4 = t2.col4
        );

上述查询假定table1为您的表名。

select * from table1;

<强>结果:

ID  Col1    Col2    Col3    Col4
---------------------------------
586 ABC    4RTFD    FGY    12,346

您可以查看演示 *here

<强>更新

下面的行将添加到示例数据集中。

id  col1    col2    col3    col4
----------------------------------
345 XYZ    4FTFD    FGY     12346
745 XYZ    4FTFD    FGY     12346
945 XYZ    4FTFD    FGY     12346

<强>结果:

id   col1    col2   col3    col4
-----------------------------------
586  ABC    4RTFD   FGY     12346
945  XYZ    4FTFD   FGY     12346

<强> DEMO

*注意: 由于teradata在线演示工具无法使用,PostgreSQL演示已被用作PostgreSQL支持的相关子查询。在本地teradata环境中也模拟了查询。

答案 3 :(得分:0)

这不是分组功能的简单用法吗?

select max(ID) ID, COL1, COL2, COL3
from tableA
group by 2,3,4

并将其保存到新表中。如果需要从现有表中删除重复行,则可以执行以下delete语句:

delete from tableA as a1
    where (
        select 1 from (
            select max(ID) ID, COL1, COL2, COL3 from tableA group by 2,3,4) a2
        where a1.ID = a2.ID
            and a1.COL1 = a2.COL1
            and a1.COL2 = a2.COL2
            and a1.COL3 = a2.COL3
         ) is null