在PostgreSQL中删除具有多列的重复行

时间:2018-08-19 01:38:42

标签: sql postgresql

我有一个带有以下各列的“票”表: voterelection_yearelection_typeparty 我需要删除voterelection_year的组合中所有重复的行,并且在弄清楚如何执行此操作时遇到了麻烦。

我运行了以下内容:

WITH CTE AS(
SELECT voter, 
       election_year,
       ROW_NUMBER()OVER(PARTITION BY voter, election_year ORDER BY voter) as RN

FROM votes
)
DELETE
FROM CTE where RN>1

基于另一个StackOverflow答案,但这似乎是特定于SQL Server的。我已经看到了使用唯一ID来执行此操作的方法,但是此特定表没有那么豪华。如何采用上述脚本删除需要的重复项?谢谢!

编辑:根据请求,创建带有一些示例数据的表:

CREATE TABLE public.votes
(
    voter varchar(10),
    election_year smallint,
    election_type varchar(2),
    party varchar(3)
);

INSERT INTO votes
    (voter, election_year, election_type, party)
VALUES
    ('2435871347', 2018, 'PO', 'EV'),
    ('2435871347', 2018, 'RU', 'EV'),
    ('2435871347', 2018, 'GE', 'EV'),
    ('2435871347', 2016, 'PO', 'EV'),
    ('2435871347', 2016, 'GE', 'EV'),
    ('10215121/8', 2016, 'GE', 'ED')
;

3 个答案:

答案 0 :(得分:3)

从CTE中删除或更新CTE在Postgres中不起作用,请参见"PostgreSQL with-delete “relation does not exists”"的可接受答案。

由于没有主键,您可以(ab)使用ctid伪列来标识要删除的行。

WITH
cte
AS
(
SELECT ctid,
       row_number() OVER (PARTITION BY voter,
                                       election_year
                          ORDER BY voter) rn
       FROM votes
)
DELETE FROM votes
       USING cte
       WHERE cte.rn > 1
             AND cte.ctid = votes.ctid;

db<>fiddle

并且可能考虑引入主键。

答案 1 :(得分:2)

这是一个选择

DELETE FROM votes T1
    USING   votes T2
WHERE   T1.ctid < T2.ctid 
    AND T1.voter = T2.voter 
    AND T1.election_year  = T2.election_year;

请参见http://sqlfiddle.com/#!15/4d45d/5

答案 2 :(得分:0)

ctid字段是每个PostgreSQL表中都存在的字段,并且对于表中的每个记录都是唯一的,表示元组的位置。 您几乎没做错,只需要ctid,因为每一行都没有唯一的ID

;WITH CTE AS(
SELECT ctid,voter, 
       election_year,
       ROW_NUMBER()OVER(PARTITION BY voter, election_year ORDER BY voter) as RN

FROM votes
)
delete  FROM votes v where v.ctid in (select CTE.ctid from  CTE where CTE.RN>1)

http://sqlfiddle.com/#!17/4d45d/14