根据列删除重复的行,但只保留匹配重复项的1行

时间:2018-07-16 09:29:45

标签: sql sql-server tsql

我有下面的表格,我需要按照下面的注释删除行...

ID  |   AccountID   |   AccountValue|    CreatedDate            | Comment
===========================================================================
1   |   1           |   2           |   2016-06-13 19:58:47.373 | Delete
2   |   1           |   2           |   2017-06-13 19:58:47.373 | Delete
3   |   1           |   2           |   2018-06-13 19:58:47.373 |
4   |   2           |   3           |   2017-06-13 19:58:47.373 |
5   |   4           |   4           |   2017-06-13 19:58:47.373 | Delete
6   |   4           |   4           |   2018-06-13 19:58:47.373 |
7   |   5           |   2           |   2017-06-13 19:58:47.373 |

有人可以帮助谁实现这一目标吗?

我有某种想法

CREATE TABLE MyAccounts (
    ID int,
    AccountID int,
    AccountValue varchar(255),
    CreatedDate datetime
);

insert into MyAccounts values(1,1,2,'2016-06-13 19:58:47.373')
insert into MyAccounts values(2,1,2,'2017-06-13 19:58:47.373')
insert into MyAccounts values(3,1,2,'2018-06-13 19:58:47.373')
insert into MyAccounts values(4,2,3,'2017-06-13 19:58:47.373')
insert into MyAccounts values(5,4,4,'2017-06-13 19:58:47.373')
insert into MyAccounts values(6,4,4,'2018-06-13 19:58:47.373')
insert into MyAccounts values(7,5,2,'2017-06-13 19:58:47.373')

我知道我想删除这组数据,但想在原始问题中每个解释中保留1行

select 
        AccountID, 
        AccountValue        
        FROM MyAccounts
        GROUP BY AccountID, AccountValue--, createddate
        having count(*) > 1

这是为了使表格如下所示

ID  |   AccountID   |   AccountValue|    CreatedDate            | Comment
===========================================================================
3   |   1           |   2           |   2018-06-13 19:58:47.373 |
4   |   2           |   3           |   2017-06-13 19:58:47.373 |
6   |   4           |   4           |   2018-06-13 19:58:47.373 |
7   |   5           |   2           |   2017-06-13 19:58:47.373 |

4 个答案:

答案 0 :(得分:0)

您可以使用row_number()函数:

delete m 
from (select *, row_number() over (partition by AccountID, AccountValue order by id desc) as seq
      from MyAccounts
     ) m
where m.seq > 1;

也许您还需要createddate子句中的partition字段。如果可以,则可以包含它。

答案 1 :(得分:0)

delete from MyAccounts
where id not in 
(
     select min(id)     
     FROM MyAccounts
     GROUP BY AccountID, AccountValue, createddate
)

答案 2 :(得分:0)

使用具有ROW_NUMBER窗口功能的CTE

;WITH cteDups
AS(
    SELECT *, RN=ROW_NUMBER()OVER (PARTITION BY M.AccountID, M.AccountValue ORDER BY M.ID DESC)
    FROM dbo.MyAccounts M
)
--SELECT *
DELETE
FROM cteDups D WHERE D.RN > 1

答案 3 :(得分:0)

EXISTS也可以用于此目的。

如果存在具有相同AccountID和AccountValue但具有较高CreatedDate的记录,则将其删除。

示例:

-- Using a table variable for demonstration purposes
declare @MyAccounts table (
 ID int identity(1,1) primary key,
 AccountID int,
 AccountValue varchar(255),
 CreatedDate datetime
);

-- Sample data
insert into @MyAccounts (AccountID, AccountValue, CreatedDate) values
 (1,2,'2016-06-13 19:58:47.373')
,(1,2,'2017-06-13 19:58:47.373')
,(1,2,'2018-06-13 19:58:47.373')
,(2,3,'2017-06-13 19:58:47.373')
,(4,4,'2017-06-13 19:58:47.373')
,(4,4,'2018-06-13 19:58:47.373')
,(5,2,'2017-06-13 19:58:47.373');

-- Remove the older records
delete acc
from @MyAccounts acc
where exists (
  select 1
  from @MyAccounts d
  where d.AccountID = acc.AccountID 
  and d.AccountValue = acc.AccountValue 
  and d.CreatedDate > acc.CreatedDate
);

-- What remains
select * from @MyAccounts order by ID;