如何通过获取每个组中的最后一条记录来删除重复记录

时间:2014-12-09 09:33:47

标签: sql sql-server

我想就我的问题发表意见。我正在开发一个存储谷歌学者出版物的项目。因此,当我存储数据时,它就会显示出来。

ID| COLUMN1                          | COLUMN2
1 | 'Knowledge and Data Engineering' | 'IEEE transactions on 16 (1)'
1 | 'Knowledge and Data Engineering' | 'IEEE transactions on 16 (1) 28-40 '
2 | 'Data Engineering'               | '1999. Proceedings.'
2 | 'Data Engineering'               | '1999. Proceedings. 15th International Conference on '
2 | 'Data Engineering'               | '1999. Proceedings. 15th International Conference on 146-153'
3 | 'ACM SIGMOD Record 30 (2)'       | '187-198'

我希望你理解我的桌子般的绘画。我想要做的是在连续的行上有相同的ID,具有最后的行。

ID| COLUMN1                          | COLUMN2
1 | 'Knowledge and Data Engineering' | 'IEEE transactions on 16 (1) 28-40 '
2 | 'Data Engineering'               | '1999. Proceedings. 15th International Conference on 146-153'
3 | 'ACM SIGMOD Record 30 (2)'       | '187-198'

感谢您的帮助。

4 个答案:

答案 0 :(得分:1)

WITH CTE AS( 
SELECT Id,
       Column1,
       Column2, 
       ROW_NUMBER() OVER (PARTITION BY Column1 ORDER BY Id DESC) AS rownum
       )
SELECT Id, Column1, column2
FROM CTE 
WHERE rownum = 1

答案 1 :(得分:1)

您可以使用 ROW_NUMBER() 窗口函数生成每ID的序号,您可以从中获取最后/最高行号。

  

ROW_NUMBER():返回结果集分区中行的序号,从1开始,每个分区的第一行。

所以我将问题分解为两个步骤:

  1. 创建包含行号的#temp表
  2. 从具有每组最高行数的临时表中选择行
  3. SQL Fiddle Demo

    MS SQL Server 2012架构设置

    CREATE TABLE Publications
        ([ID] int, [COLUMN1] varchar(34), [COLUMN2] varchar(63))
    ;
    
    INSERT INTO Publications
        ([ID], [COLUMN1], [COLUMN2])
    VALUES
        (1, '''Knowledge and Data Engineering''', '''IEEE transactions on 16 (1)'''),
        (1, '''Knowledge and Data Engineering''', '''IEEE transactions on 16 (1) 28-40 '''),
        (2, '''Data Engineering''', '''1999. Proceedings.'''),
        (2, '''Data Engineering''', '''1999. Proceedings. 15th International Conference on '''),
        (2, '''Data Engineering''', '''1999. Proceedings. 15th International Conference on 146-153'''),
        (3, '''ACM SIGMOD Record 30 (2)''', '''187-198''')
    ;
    

    查询1

    -- INSERT VALUES INTO TEMP TABLE WITH ROW_NUMBER
    SELECT  ID ,
            Column1 ,
            Column2 ,
            ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY ID ) RowNo
    INTO #TEMP
    FROM    Publications
    
    -- SELECT ROW FOR EACH ID WITH MAX ROW_NUMBER
    SELECT  T1.ID, T1.Column1, T1.Column2
    FROM    #TEMP T1
    WHERE RowNo = (SELECT MAX(RowNo) FROM #TEMP T2 WHERE T1.ID = T2.ID)
    ORDER BY ID
    

    <强> Results

    | ID | COLUMN1                          | COLUMN2                                                       |
    |----|----------------------------------|---------------------------------------------------------------|
    |  1 | 'Knowledge and Data Engineering' | 'IEEE transactions on 16 (1) 28-40 '                          |
    |  2 | 'Data Engineering'               | '1999. Proceedings. 15th International Conference on 146-153' |
    |  3 | 'ACM SIGMOD Record 30 (2)'       | '187-198'                                                     |
    

答案 2 :(得分:0)

试试这个:

SELECT * FROM
    (
        SELECT ID, COLUMN1, COLUMN2, ROW_NUMBER() OVER 
        (PARTITION BY ID ORDER BY ID DESC) AS ROWID FROM YOUR_TABLE
    ) AS A
WHERE ROWID = 1

答案 3 :(得分:0)

这个问题已经问了一百万次

使用cteDup AS(SELECT *,ROW_NUMBER()OVER(按ID列分区ID排序)'Rank'              从表)

永久删除使用此

DELETE FROM cteDup
    WHERE Rank > 1

否则

select top 20 * from cteDup where Rownumber = 1