我有一个表格,其重复数据类似于以下示例:
ID | ACCNO | ACCNAME | ADDRESS1 | ADDRESS2 | City
1 | 1001 | Joe B Ltd | 123 Street1 | | London
2 | 1001 | JoeB Ltd | 123 Street1 | | London
3 | 1001 | JoeB Ltd | 123 Street1 | | London
4 | 1001 | JoeB Ltd | 123 Street1 | London | London
5 | 1001 | JoeB Ltd | 129 Street9 | | London
ID当前是唯一的主键,但是当重复删除时,ACCNO应该是。
我见过很多查询要删除重复记录,例如https://stackoverflow.com/a/18719814/4949859
但是我想根据重复行的数量选择要保留的行。我相信如果我从具有最高计数的分组项目中选择一行,我最有可能获得格式正确的地址。
在我的例子中使用“NOT IN(SELECT MAX”或“MIN”会在我的情况下留下错误的记录。
但是,当我使用GROUP BY获得最高计数时,我不能包含ID字段。
SELECT COUNT(ID), ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY FROM SUPPLIERS GROUP BY ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY ORDER BY COUNT(ID) DESC
这会得到结果:
Count(ID) | ACCNO | ACCNAME | ADDRESS1 | ADDRESS2 | City
2 | 1001 | JoeB Ltd | 123 Street1 | | London
1 | 1001 | Joe B Ltd | 123 Street1 | | London
1 | 1001 | JoeB Ltd | 123 Street1 | London | London
1 | 1001 | JoeB Ltd | 129 Street9 | | London
希望我有意义。我不知道如何从计数最高的组返回ID(任何)。有没有人知道我怎么做到这一点?
编辑:
我上面的示例将除ID以外的所有列分组并获取计数,第2行和第3行将组合在一起,组计数为2(其余的计数ID为1,因为它们都是唯一的)所以我想要保持第2行或第3行,无论哪一个都是相同的。
编辑2:
我认为这会起作用:
DELETE
FROM SUPPLIERS
WHERE ID NOT IN
(SELECT TOP 1 MAX(ID) FROM SUPPLIERS
Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY
ORDER BY COUNT(ID) DESC)
不幸的是,这会删除除一条记录以外的所有记录,其选择版本看起来很有希望:
SELECT *
FROM SUPPLIERS a
WHERE ID NOT IN
(SELECT TOP 1 MAX(ID) FROM SUPPLIERS b
WHERE a.ACCNO = b.ACCNO Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY
ORDER BY COUNT(ID) DESC)
答案:
感谢用户1751825(标记为答案让我最接近最终结果)
DELETE FROM SUPPLIERS WHERE ID IN (SELECT ID
FROM SUPPLIERS a
WHERE ID NOT IN
(SELECT TOP 1 MAX(ID) FROM SUPPLIERS b
WHERE a.ACCNO = b.ACCNO Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY
ORDER BY COUNT(ID) DESC))
答案 0 :(得分:1)
根据我的理解,在您提供的示例中,您希望保留记录ID
= 5并删除其余记录。
WITH CTE AS(
SELECT ID, ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY,
RN = ROW_NUMBER()OVER(PARTITION BY ACCNO ORDER BY ID DESC)
FROM SUPPLIERS
)
DELETE FROM CTE WHERE RN > 1
这应该可以做到!
答案 1 :(得分:0)
我认为这应该做你需要的。
delete from SUPPLIERS
where ID NOT IN (
Select max(ID)
FROM SUPPLIERS
Group by ACCNO
)
答案 2 :(得分:0)
这对我有用。
表格强>
ID ACCNO ACCNAME ADDRESS1 ADDRESS2 City
1 1001 Joe B Ltd 123 Street1 London
2 1001 JoeB Ltd 123 Street1 London
3 1001 JoeB Ltd 123 Street1 London
4 1001 JoeB Ltd 123 Street1 London London
5 1001 JoeB Ltd 129 Street9 London
6 67 Nise Gata1
7 67 Nisse Gata2
8 67 Nisse Gata1 Haninge Stockholm
<强> RESULT 强>:
ACCNO ACCNAME ADDRESS1 ADDRESS2 City
1001 JoeB Ltd 123 Street1 London London
67 Nisse Gata1 Haninge Stockholm
<强>代码强>:
select distinct
[ ACCNO ],
FIRST_VALUE([ ACCNAME ]) OVER (PARTITION BY [ ACCNO ] ORDER BY case when [ ACCNAME ] is null then 1 else 0 end, rownumber ) as [ ACCNAME ],
FIRST_VALUE([ ADDRESS1 ]) OVER (PARTITION BY [ ACCNO ] ORDER BY case when [ ADDRESS1 ] is null then 1 else 0 end, rownumber ) as [ ADDRESS1 ],
FIRST_VALUE([ ADDRESS2 ]) OVER (PARTITION BY [ ACCNO ] ORDER BY case when [ ADDRESS2 ] is null then 1 else 0 end, rownumber ) as [ ADDRESS2 ],
FIRST_VALUE([ City]) OVER (PARTITION BY [ ACCNO ] ORDER BY case when [ City] is null then 1 else 0 end, rownumber ) as [ City]
FROM
(
SELECT
[ ACCNO ]
,[ ACCNAME ]
,[ ADDRESS1 ]
,case when ltrim(rtrim([ ADDRESS2 ] )) = '' then null else [ ADDRESS2 ] end as [ ADDRESS2 ] -- spaces = NULL
,case when ltrim(rtrim([ City] )) = '' then null else [ City] end as [ City]
,count(*) as quantity
,ROW_NUMBER() OVER (
PARTITION BY [ ACCNO ]
ORDER BY
[ ACCNO ],
count(*) desc
) as rownumber
FROM [dbo].[test_sql]
GROUP BY cube ( [ ACCNO ]
,[ ACCNAME ]
,[ ADDRESS1 ]
,[ ADDRESS2 ]
,[ City])
HAVING [ ACCNO ] is not null
) myGroup