我正在尝试从postgres中的表中删除重复数据。在我的表格中,不是主键。
postgres=# select * from customer_temp;
id | firstname | country | phonenumber
----+-----------+-----------+-------------
1 | Sachin | India | 3454
2 | Viru | India | 3454
3 | Saurav | India | 3454
4 | Ponting | Australia | 3454
5 | Warne | Australia | 3454
7 | Be;; | England | 3454
8 | Cook | England | 3454
8 | Cook | England | 3454
8 | Cook | England | 3454
(9 rows)
我正在使用以下查询删除重复记录。
delete from customer_temp temp
using (select out1.id, out1.firstname
from customer_temp out1
where (select count(out2.id)
from customer_temp out2
where out1.firstname=out2.firstname group by out2.firstname
) > 1
) temp1
where temp.id in (select id
from temp1
where id not in(select id
from temp1
LIMIT 1 OFFSET 0));
但我收到以下错误: -
ERROR: relation "temp1" does not exist
LINE 1: ...name) > 1) temp1 where temp.id in (select id from temp1 wher...
虽然关系temp1是作为using
的一部分创建的,但为什么我不能在where子句过滤器中使用它们。
根据How Select SQL gets executed,首先执行FROM,并且行的结果可用于查询执行的下一阶段。那么,为什么temp1不适用于where
部分中的子查询。
答案 0 :(得分:1)
id
唯一标识每一行,这是编写逻辑的简单方法:
delete from customer_temp
where id not in (select min(ct2.id)
from customer_temp ct2
where ct2.id is not null
group by ct2.firstname, ct2.country, ct2.phonenumber
);
我注意到我正在使用带有子查询的not in
。我通常会对此发出警告(尽管由于where
而这是安全的)。您可以使用exists
或使用>
和相关子查询执行类似操作。
编辑:
如果id
不唯一,那么它对于列来说是一个非常糟糕的名称。但除此之外,您可以使用oid
:
delete from customer_temp
where oid not in (select min(oid)
from customer_temp ct2
group by ct2.firstname, ct2.country, ct2.phonenumber
);
This is a built-in identifier.
但是,最好的方法可能只是重建表:
create table customer_temp_temp as
select distinct on (firstname, country, phone_number) t.*
from customer_temp t
order by firstname, country, phone_number;