Question

假设有一个包含10条记录的表，其中5条完全相同（这表示该表中没有主键或唯一键），所以问题是“写一个SQL查询删除所有重复记录，只留下这5条重复记录中的一条记录“，所以最后该表中必须有6条记录是不同的。实际上我今天在接受采访时被问到这个问题，我无法回答。任何人都可以帮我这个吗？

Answer 1

你可以通过以下步骤实现它。

1）在临时表中存储不同的记录。

2）截断原始表格。

3）将数据从temp插入原始。

select * into #tmp from original_table where 1=2 insert into #tmp select distinct * from original_table . truncate table original_table insert into original_table select * from #tmp

Answer 2

这里我使用了一个窗口函数，一个或两个不同于其他不同5行的列，使得重复的5行中的唯一记录可以在用逗号分隔的partition by后输入窗口函数。
注意：使用PostgreSQL语法。

 ALTER TABLE table_name ADD COLUMN id SERIAL;
 UPDATE table_name SET id = DEFAULT;
 ALTER TABLE table_name ADD PRIMARY KEY (id);

DELETE FROM table_name
WHERE id IN 
(SELECT id  FROM (SELECT id, ROW_NUMBER() OVER (partition BY column_name ORDER BY id) AS rnum
                     FROM table_name) t
              WHERE t.rnum > 1);

检查此SQLFiddle

Answer 3

的PostgreSQL：

以下是样本表/数据

CREATE TABLE ident (
    a INT
    ,b INT
    ,c INT
    );

INSERT INTO ident
SELECT generate_series(10, 14)
    ,generate_series(100, 104)
    ,generate_series(210, 214);

INSERT INTO ident
SELECT unnest(array [1,1,1,1,1])
    ,unnest(array [1,1,1,1,1])
    ,unnest(array [1,1,1,1,1]);

由于表格没有primary/unique，我们可以使用ctid。

<强> CTID

行表格中的行版本的物理位置。注意虽然ctid可用于非常快速地定位行版本，如果由VACUUM FULL更新或移动，则行的ctid将会更改。因此，ctid作为长期行标识符是无用的。 OID，或应该使用更好的用户定义的序列号来识别逻辑行。

select ctid,* from  ident;

将为您提供以下行

ctid   a  b   c   
------ -- --- --- 
(0,1)  10 100 210 
(0,2)  11 101 211 
(0,3)  12 102 212 
(0,4)  13 103 213 
(0,5)  14 104 214 
(0,6)  1  1   1   
(0,7)  1  1   1   
(0,8)  1  1   1   
(0,9)  1  1   1   
(0,10) 1  1   1

我们应该使用windows函数来查找相同行的ctid

SELECT ctid
        FROM (
            SELECT ctid
                ,row_number() OVER (
                    PARTITION BY a
                    ,b
                    ,c ORDER BY ctid
                    ) rn
            FROM ident
            ) t
        WHERE rn > 1

并删除表中的行

DELETE
FROM ident
WHERE ctid IN (
        SELECT ctid
        FROM (
            SELECT ctid
                ,row_number() OVER (
                    PARTITION BY a
                    ,b
                    ,c ORDER BY ctid
                    ) rn
            FROM ident
            ) t
        WHERE rn > 1
        );

sqlfiddle

<强> OR

你可以简单地使用

delete from ident where  ctid not in (
select min(ctid) from ident group by a,b,c
)

Answer 4

既然你没有任何线索，这是真正的求助请求......

首先，这个问题是：

有点荒谬。如果没有钥匙创造桌子的人应该被解雇
面试问题非常艰难

如果将多个以冒号分隔的查询计为一个＆＃34;查询，这里是一个mysql解决方案：

alter table mytable add column id int primary key auto_increment;
delete t1
from mytable t1
join mytable t2 on t1.id < t2.id
  and t1.a = t2.a and t1.b = t2.b and t1.c = t2.c;
alter table mytable drop column id

请参阅SQLFiddle。

如何从表中删除除一个以外的所有重复记录？

4 个答案: