在Postgres中查找多个重复项

时间:2013-09-12 16:06:31

标签: sql postgresql duplicates

我们遇到了一个意外,其中有多个具有重复值的行已插入表中,我需要以相当具体的格式查找哪些行。到目前为止,我有这个问题:

    SELECT p2.id
    FROM assignmentobject p1, assignmentobject p2
    WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
    p1.id <> p2.id and p1.assignmentid = 15548
    group by p1.id, p2.id

比较行的几何形状,如果它们相同则吐出它们。 ID是主键,并按顺序创建。

然而,这提出了一个问题,因为结果的这一小部分显示:

p1.id   p2.id
35311   35314
35311   35315
35314   35311
35314   35315
35315   35311
35315   35314

从这里可以看出,35311,35314和35315具有相同的几何形状,因此,它们之间的所有组合都包含在结果中。我打算实现的目标是将最低或最高ID用作“基础”,并忽略不涉及此“基础”的其他组合。即,上面显示的结果将是:

p1.id    p2.id
35311    35314
35311    35315

这里省略了31314和35315之间的组合。这是否可以使用纯SQL实现?

2 个答案:

答案 0 :(得分:1)

只需将<>运算符更改为<

即可
WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
p1.id < p2.id and p1.assignmentid = 15548

如果assignmentid重复,并且您希望立即重复所有重复

select p2.id
from
    assignmentobject p1
    inner join
    assignmentobject p2 using(assigmentid)
where
    st_equals(p1.the_geom, p2.the_geom) and
    p1.id < p2.id
group by p1.id, p2.id

答案 1 :(得分:1)

CREATE TABLE pair (
        ll INTEGER NOT NULL
        , rr INTEGER NOT NULL
        , PRIMARY KEY (ll , rr)
        ) ;

INSERT INTO pair (ll,rr) VALUES
(35311,35314) ,(35311,35315)
,(35314,35311) ,(35314,35315)
,(35315,35311) ,(35315,35314)
        ;

SELECT p1.ll AS p1, p1.rr AS p2
FROM pair p1
WHERE p1.ll < p1.rr -- tie breaker
AND NOT EXISTS (
        SELECT * FROM pair nx
        WHERE nx.ll < nx.rr
        AND nx.rr = p1.ll
        )
        ;

与打包到CTE中的原始地理查询相同:

WITH pair AS (
  SELECT p1.id AS ll
       , p2.id AS rr
  FROM assignmentobject p1
  JOIN assignmentobject p2 ON ST_Equals(p1.the_geom, p2.the_geom)
                          -- not sure if you want this ...
                          AND p1.assignmentid = p2.assignmentid 
  WHERE p1.id <> p2.id and p1.assignmentid = 15548
  -- group by seems to make no sense here
  -- group by p1.id, p2.id
   )                                                      
SELECT pp.ll AS p1, pp.rr AS p2
FROM pair pp
WHERE pp.ll < pp.rr -- tie breaker
AND NOT EXISTS (
        SELECT * FROM pair nx
        WHERE nx.ll < nx.rr
        AND nx.rr = pp.ll
        )
        ;