Question

I have seen this question being asked before but not in Postgres and not with 5 columns. I am working with Postgres 9.4 and I have a big location table that has some duplicates. There are 5 fields that I want to check for duplicates and those are: city,state,zipcode,latitudes,longitudes I have tried other methods such as this find rows that multiple columns are identical using SQL query but it kept giving me errors even after I changed the names to match my table and columns. A lot of my rows look like this

Chicago IL 60475 41.881 -87.6245
Chicago IL 60475 41.853 -87.6846
Chicago IL 60475 41.881 -87.6245
Chicago IL 60475 41.890 -87.6273

There are many with the same city,state,zipcode and slightly different latitudes and longitudes. In the List above only # 1 and # 3 are identical so I would like to delete 1 and leave the other. I am trying to find the correct way of doing this without deleting extra rows any suggestions would be great... I got this error on the having query ERROR: column reference "city" is ambiguous LINE 1: Select city,state

Select city,state
FROM zipss JOIN 
 (SELECT city,state, count(*)
  FROM zipss
  GROUP BY city,state
  HAVING count(*) >=2) dupl on zipss.city = dupl.city and zipss.state = dupl.state;

Answer 1

In Postgres, you can use the ctid for this purpose. This is a built-in column that you really shouldn't use. But, if you don't have a primary key on a table, then it is useful:

delete from table
    where ctid not in (select max(ctid)
                       from table t
                       group by city, state, zipcode, latitude, longitude
                      );

This should keep the row with the largest ctid for each combination of the five columns.

sql query how can I delete rows that have 5 columns identical and leave 1 in Postgres

1 个答案: