Question

我正在尝试从postgres中的表中删除重复数据。在我的表格中，不是主键。

postgres=# select * from customer_temp;
 id | firstname |  country  | phonenumber
----+-----------+-----------+-------------
  1 | Sachin    | India     |        3454
  2 | Viru      | India     |        3454
  3 | Saurav    | India     |        3454
  4 | Ponting   | Australia |        3454
  5 | Warne     | Australia |        3454
  7 | Be;;      | England   |        3454
  8 | Cook      | England   |        3454
  8 | Cook      | England   |        3454
  8 | Cook      | England   |        3454
(9 rows)

我正在使用以下查询删除重复记录。

delete from customer_temp temp 
using (select  out1.id, out1.firstname 
       from customer_temp out1 
       where (select count(out2.id) 
              from customer_temp out2 
              where out1.firstname=out2.firstname group by out2.firstname
              ) > 1
       ) temp1 
where temp.id in (select id 
                  from temp1 
                  where id not in(select id 
                                  from temp1 
                                  LIMIT 1 OFFSET 0));

但我收到以下错误： -

ERROR:  relation "temp1" does not exist
LINE 1: ...name) > 1) temp1 where temp.id in (select id from temp1 wher...

虽然关系temp1是作为using的一部分创建的，但为什么我不能在where子句过滤器中使用它们。

根据How Select SQL gets executed，首先执行FROM，并且行的结果可用于查询执行的下一阶段。那么，为什么temp1不适用于where部分中的子查询。

Answer 1

嗯。嗯。。。假设id唯一标识每一行，这是编写逻辑的简单方法：

delete from customer_temp
    where id not in (select min(ct2.id)
                     from customer_temp ct2
                     where ct2.id is not null
                     group by ct2.firstname, ct2.country, ct2.phonenumber
                    );

我注意到我正在使用带有子查询的not in。我通常会对此发出警告（尽管由于where而这是安全的）。您可以使用exists或使用>和相关子查询执行类似操作。

编辑：

如果id不唯一，那么它对于列来说是一个非常糟糕的名称。但除此之外，您可以使用oid：

delete from customer_temp
    where oid not in (select min(oid)
                      from customer_temp ct2
                      group by ct2.firstname, ct2.country, ct2.phonenumber
                    );

This is a built-in identifier.

但是，最好的方法可能只是重建表：

create table customer_temp_temp as
    select distinct on (firstname, country, phone_number) t.*
    from customer_temp t
    order by firstname, country, phone_number;

为什么在postgresql查询中创建的中间关系不能在where子句过滤器中引用？

1 个答案: