从表中删除重复项

时间:2008-10-28 14:38:54

标签: sql postgresql

数据库类型是PostGres 8.3。

如果我写道:

SELECT field1, field2, field3, count(*) 
FROM table1
GROUP BY field1, field2, field3 having count(*) > 1;

我有一些计数超过1的行。我怎么能取出副本(我仍然想要每行1行而不是+1行......我不想全部删除它们。)

示例:

1-2-3
1-2-3
1-2-3
2-3-4
4-5-6

应该成为:

1-2-3
2-3-4
4-5-6

我找到的唯一答案是there,但我想知道我是否可以在没有哈希列的情况下做到这一点。

警告 我没有唯一编号的PK,所以我不能使用min(...)技术。 PK是3个领域。

7 个答案:

答案 0 :(得分:6)

这是所有表应具有主键的许多原因之一(不一定是ID号或IDENTITY,而是一个或多个列的组合,它们唯一地标识行并且在数据库中强制执行其唯一性)。

你最好的选择是这样的:

SELECT field1, field2, field3, count(*) 
INTO temp_table1
FROM table1
GROUP BY field1, field2, field3 having count(*) > 1

DELETE T1
FROM table1 T1
INNER JOIN (SELECT field1, field2, field3
      FROM table1
      GROUP BY field1, field2, field3 having count(*) > 1) SQ ON
            SQ.field1 = T1.field1 AND
            SQ.field2 = T1.field2 AND
            SQ.field3 = T1.field3

INSERT INTO table1 (field1, field2, field3)
SELECT field1, field2, field3
FROM temp_table1

DROP TABLE temp_table1

答案 1 :(得分:0)

一个可能的答案是:

CREATE <temporary table> (<correct structure for table being cleaned>);
BEGIN WORK;   -- if needed
INSERT INTO <temporary table> SELECT DISTINCT * FROM <source table>;
DELETE FROM <source table>
INSERT INTO <source table> SELECT * FROM <temporary table>;
COMMIT WORK;  -- needed
DROP <temporary table>;

我不确定在事务语句中是否需要'work',也不确定PostgreSQL中是否需要显式BEGIN。但这个概念适用于任何DBMS。

唯一要注意的是引用约束,特别是触发删除操作。如果存在,这可能不太令人满意。

答案 2 :(得分:0)

这将使用OID对象ID(如果表是用它创建的):

DELETE FROM table1
WHERE OID NOT IN (SELECT   MIN (OID)
                              FROM table1
                          GROUP BY field1, field2, field3)

答案 3 :(得分:0)

我应该误解一些事情,但我会说:

选择 DISTINCT field1,field2,field3 FROM table1

太容易变好了? ^^

答案 4 :(得分:0)

使用TSQL,不知道Postgres是否支持临时表,但您可以选择临时表,然后循环并删除并将结果插回到原始

-- **Disclaimer** using TSQL
-- You could select your records into a temp table with a pk
Create Table #dupes
([id] int not null identity(1,1), f1 int, f2 int, f3 int)

Insert Into #dupes (f1,f2,f3) values (1,2,3)
Insert Into #dupes (f1,f2,f3) values (1,2,3)
Insert Into #dupes (f1,f2,f3) values (1,2,3)
Insert Into #dupes (f1,f2,f3) values (2,3,4)
Insert Into #dupes (f1,f2,f3) values (4,5,6)
Insert Into #dupes (f1,f2,f3) values (4,5,6)
Insert Into #dupes (f1,f2,f3) values (4,5,6)
Insert Into #dupes (f1,f2,f3) values (7,8,9)

Select f1,f2,f3 From #dupes

Declare @rowCount int
Declare @counter int
Set @counter = 1
Set @rowCount = (Select Count([id]) from #dupes)

while (@counter < @rowCount + 1)
    Begin
       Delete From #dupes
       Where [Id] <> 
            (Select [id] From #dupes where [id]=@counter)
                and
            (
                [f1] = (Select [f1] from #dupes where [id]=@counter)
                and
                [f2] = (Select [f2] from #dupes where [id]=@counter)
                and
                [f3] = (Select [f3] from #dupes where [id]=@counter)
            )
       Set @counter = @counter + 1
    End

Select f1,f2,f3 From #dupes -- You could take these results and pump them back into --your original table

Drop Table #dupes

在MS SQL Server 2000上测试过。不熟悉Postgres的选项,但这可能会引导您朝着正确的方向前进。

答案 5 :(得分:0)

这是我发现的最简单的方法:

Postgre SQL语法:

CREATE TABLE tmp AS SELECT distinct * FROM table1
truncate table table1
insert into table1 select * from tmp
drop table tmp

T-SQL语法:

select distinct * into #tmp from table1
truncate table table1
insert into table1 select * from #tmp
drop table #tmp

答案 6 :(得分:0)

这个问题很好Answer,但对于SQL Server。它使用SQL Server提供的ROWCOUNT,效果很好。我从未使用PostgreSQL,因此不知道PostgreSQL中ROWCOUNT的等价物。