Postgres删除所有重复的记录,但通过排序删除一条

时间:2018-10-16 03:10:04

标签: sql postgresql

我想在Postgres中知道如何删除所有重复的记录,但是通过按列排序来删除一个。

假设我有下表foo

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  1 | foo  | sydney    | 2018-05-24 15:40:32.593745+10
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  3 | foo  | sydney    | 2018-05-29 22:17:02.927263+10
                  4 | foo  | sydney    | 2018-06-13 16:44:32.703174+10
                  5 | foo  | sydney    | 2018-06-13 16:45:01.324273+10
                  6 | foo  | sydney    | 2018-06-13 17:04:49.487767+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

我想通过检查(名称,区域)元组来删除所有重复项,但保留具有最大created_at列的重复项。结果将是:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

但是我不知道从哪里开始。有什么想法吗?

2 个答案:

答案 0 :(得分:1)

DELETE FROM foo
      WHERE id IN
               (SELECT id
                  FROM (SELECT id,
                               ROW_NUMBER ()
                               OVER (PARTITION BY region
                                     ORDER BY created_at DESC)
                                  row_no
                          FROM foo)
                 WHERE row_no > 1)

答案 1 :(得分:1)

使用带有ROW_NUMBERPARTITION BY的子查询来过滤出具有重复区域的行,同时保留每个区域中的最新行。确保您的子查询使用AS关键字来防止Postgre语法错误:

SELECT * 
FROM foo 
WHERE id IN (
  SELECT a.id 
  FROM (
    SELECT id, ROW_NUMBER() OVER (
        PARTITION BY region 
        ORDER BY created_at DESC
    ) row_no 
    FROM foo
  ) AS a 
  WHERE row_no > 1
);

...返回要删除的行。对结果满意后,将SELECT *替换为DELETE,以删除行。

SQLFiddle demo