链接PostgreSQL中的重复记录

时间:2013-12-15 05:08:41

标签: postgresql

如何链接PostrgreSQL中的重复记录?我找到了他们:

SELECT * FROM (
  SELECT id, import_id, name,
  ROW_NUMBER() OVER(PARTITION BY address ORDER BY name asc) AS Row
  FROM companies
) dups
where 
dups.Row > 1 ORDER BY dups.name;

请参阅http://sqlfiddle.com/#!15/af016/7/1

上的示例代码和演示

我想在名为linked_id的公司中添加一个列,该列将设置为每组重复记录中第一个的import_id

2 个答案:

答案 0 :(得分:1)

尝试:

UPDATE companies c
SET import_id = q.import_id
FROM (
  SELECT id, 
  FIRST_VALUE(import_id) 
      OVER(PARTITION BY name, address ORDER BY name asc) AS import_id,
  ROW_NUMBER() 
      OVER(PARTITION BY name, address ORDER BY name asc) AS Rn
  FROM companies
) q
WHERE c.id = q.id AND q.rn > 1
;

演示:http://sqlfiddle.com/#!15/af016/10

答案 1 :(得分:1)

这会将parent_id设置为要匹配的第一家公司的import_id。

UPDATE companies
SET parent_id=rs.parent_id FROM
(SELECT id, first_value(import_id)
 OVER (PARTITION BY address ORDER BY name) as parent_id
 FROM companies
) AS rs
WHERE rs.id=companies.id;