I have seen this question being asked before but not in Postgres and not with 5 columns. I am working with Postgres 9.4 and I have a big location table that has some duplicates. There are 5 fields that I want to check for duplicates and those are: city,state,zipcode,latitudes,longitudes I have tried other methods such as this find rows that multiple columns are identical using SQL query but it kept giving me errors even after I changed the names to match my table and columns. A lot of my rows look like this
There are many with the same city,state,zipcode and slightly different latitudes and longitudes. In the List above only # 1 and # 3 are identical so I would like to delete 1 and leave the other. I am trying to find the correct way of doing this without deleting extra rows any suggestions would be great... I got this error on the having query ERROR: column reference "city" is ambiguous LINE 1: Select city,state
Select city,state
FROM zipss JOIN
(SELECT city,state, count(*)
FROM zipss
GROUP BY city,state
HAVING count(*) >=2) dupl on zipss.city = dupl.city and zipss.state = dupl.state;
答案 0 :(得分:4)
In Postgres, you can use the ctid
for this purpose. This is a built-in column that you really shouldn't use. But, if you don't have a primary key on a table, then it is useful:
delete from table
where ctid not in (select max(ctid)
from table t
group by city, state, zipcode, latitude, longitude
);
This should keep the row with the largest ctid
for each combination of the five columns.