Pgsql删除一些列(不是全部)重复的行

时间:2018-01-02 09:45:32

标签: sql postgresql

表格 - col_pk,col1,col2,col3,col4,col_date_updated

此表包含一些行,其中col2和col3的列值重复。 我想保留col_date_updated的那些行是最新的(最大)。

例如:

col_pk, col1, col2,  col3, col4,   col_date_updated
1,      A,    hello, now,  200.00, 2017-12-12 15:09:44.437546
2,      B,    hello, now,  490.00, 2017-12-12 15:09:42.437065
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617
5,      E,    hi,    now,  250.00, 2017-12-12 10:09:41.436617

预期结果:

col_pk, col1, col2,  col3, col4,   col_date_updated
3,      C,    hi,    now,  300.00, 2017-12-12 15:09:41.436617
4,      D,    hello, now,  250.00, 2017-12-12 15:09:45.436617

5 个答案:

答案 0 :(得分:1)

检查一下。

SELECT DISTINCT ON (col2, col3) t.*
FROM table t
ORDER BY col_date_updated DESC

在col2和col3上应用distinct,因为你希望它们是唯一的,并通过desc保持最新订单

答案 1 :(得分:0)

如果您只是想选择来获得预期的输出,那么ROW_NUMBER会派上用场:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3
        ORDER BY col_date_updated DESC) rn
    FROM yourTable
)
SELECT col_pk, col1, col2,  col3, col4, col_date_updated
FROM cte
WHERE rn = 1;

如果您想要删除其他记录,那么我们也可以重复使用CTE:

DELETE FROM yourTable WHERE col_pk IN (SELECT col_pk FROM cte WHERE rn > 1);

答案 2 :(得分:0)

你可以尝试这样的事情。

SELECT t.*
  FROM yourtable t
 WHERE col_date_updated IN (SELECT MAX (col_date_updated)
                    FROM yourtable i
                   WHERE t.col2 = i.col2 AND t.col3 = i.col3);

所以,如果你想删除其他记录,你可以使用它。

DELETE 
  FROM yourtable t
 WHERE col_date_updated NOT IN (SELECT MAX (col_date_updated)
                    FROM yourtable i
                   WHERE t.col2 = i.col2 AND t.col3 = i.col3);

DEMO

答案 3 :(得分:0)

如果要抑制除{col2,col3}之外的所有行(

)以外的所有行:

SELECT *
FROM thetable zt
WHERE NOT EXISTS (
        -- If a record exists with the same col2,col3,
        -- but a more recent date than zt.col_date_updated
        -- then zt.* cannot be the most recent one
        SELECT *
        FROM thetable nx
        WHERE nx.col2 = zt.col2 -- same value
        AND nx.col3 = zt.col3   -- same value
        AND nx.col_date_updated > zt.col_date_updated -- more recent
        );

如果您想要删除除{col2,col3}相同的最新行以外的所有行:

DELETE
FROM thetable zt
WHERE EXISTS (
        -- If a record exists with the same col2,col3,
        -- but a more recent date than zt.t.col_date_updated
        -- then zt.* cannot be the most recent one
        -- and we can delete zt.
        SELECT *
        FROM thetable nx
        WHERE nx.col2 = zt.col2 -- same value
        AND nx.col3 = zt.col3   -- same value
        AND nx.col_date_updated > zt.col_date_updated -- more recent
        );

答案 4 :(得分:-1)

这是最快的方式:

SELECT * FROM tablename WHERE col_pk IN 
       (SELECT col_pk FROM 
               (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
        FROM tablename) t
 WHERE t.rnum > 1);

如果你想删除:

DELETE FROM tablename WHERE col_pk IN 
       (SELECT col_pk FROM 
               (SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
        FROM tablename DESC) t
 WHERE t.rnum > 1);