删除具有多个条件的多列的重复记录

时间:2019-05-17 04:59:14

标签: sql vertica

我有下面的表1:

----------------------------------
| Id   |     Value  |     Date   |
----------------------------------
| 1    |      xxx   | 05/01/2015 |
| 2    |      xxx   | 05/02/2015 |
| 3    |      yyy   | 06/01/2015 |
| 4    |      yyy   | 06/01/2015 |
----------------------------------

使用最新日期删除重复的行,如果日期相等,则使用最新ID删除重复的行。 (换句话说,保留最新日期和最新ID,删除旧日期和ID)

不编程,仅查询。该表是多联接查询中的联接表之一。

应该与Vertica兼容。

3 个答案:

答案 0 :(得分:1)

以下语句删除重复的行并保留最高ID:

DELETE t1 FROM table1 t1
    INNER JOIN
    table1 t2 
WHERE
    t1.id < t2.id AND t1. Date = t2. Date;

可能对您有帮助,您可以根据需要进行修改

答案 1 :(得分:0)

我认为Vertica将支持这一点:

delete table1
where table1.id not in (select t2.id
                        from (select t2.*
                                     row_number() over (partition by t2.value order by t2.date, t2.id desc) as seqnum
                              from table1 t2
                             )
                         where seqnum = 1
                        );

答案 2 :(得分:0)

如果您想将此表与其他表连接,则可能只想拥有所需的行,而不必在连接前删除内容。

Vertica提供了 analytic limit子句,在这里可以派上用场。

以下是如何处理您的输入数据的方法:

WITH
input(Id,Value,Date) AS (
          SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
)
SELECT
 *
FROM input
LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC);
-- out  Id | Value |    Date    
-- out ----+-------+------------
-- out   2 | xxx   | 2015-05-02
-- out   3 | yyy   | 2015-06-01
-- out (2 rows)
-- out 
-- out Time: First fetch (2 rows): 14.240 ms. All rows formatted: 14.276 ms

这个帮助...吗?

好吧,如果您确实需要删除,也可以在NOT IN谓词中使用以上内容来运行删除...就像我在这里所做的一样:

-- creating a temp table to delete from  ....
CREATE LOCAL TEMPORARY TABLE t1 (Id,Value,Date) 
ON COMMIT PRESERVE ROWS AS (   
          SELECT 1,'xxx',DATE '2015-05-01'
UNION ALL SELECT 2,'xxx',DATE '2015-05-02'
UNION ALL SELECT 3,'yyy',DATE '2015-06-01'
UNION ALL SELECT 4,'yyy',DATE '2015-06-01'
);
-- delete as announced ..
DELETE FROM t1 WHERE id NOT IN (
  SELECT
    id
  FROM t1
  LIMIT 1 OVER(PARTITION BY Value ORDER BY Date DESC, id DESC)
);
-- check the content now ...
SELECT * FROM t1;
-- out CREATE TABLE
-- out Time: First fetch (0 rows): 16.081 ms. All rows formatted: 
-- 16.110 ms
-- out  OUTPUT 
-- out --------
-- out       2
-- out (1 row)
-- out 
-- out Time: First fetch (1 row): 61.740 ms. All rows formatted:
--    61.788 ms
-- out  Id | Value |    Date    
-- out ----+-------+------------
-- out   2 | xxx   | 2015-05-02
-- out   3 | yyy   | 2015-06-01
-- out (2 rows)
-- out Time: First fetch (2 rows): 6.761 ms. 
-- All rows formatted: 6.814 ms