SQL删除重复项 - 保持记录包含在最高计数中

时间:2016-04-07 10:03:03

标签: sql-server tsql

我有一个表格,其重复数据类似于以下示例:

ID | ACCNO  | ACCNAME    | ADDRESS1     | ADDRESS2    | City
1  | 1001   | Joe B Ltd  | 123 Street1  |             | London
2  | 1001   | JoeB Ltd   | 123 Street1  |             | London
3  | 1001   | JoeB Ltd   | 123 Street1  |             | London
4  | 1001   | JoeB Ltd   | 123 Street1  | London      | London
5  | 1001   | JoeB Ltd   | 129 Street9  |             | London

ID当前是唯一的主键,但是当重复删除时,ACCNO应该是。

我见过很多查询要删除重复记录,例如https://stackoverflow.com/a/18719814/4949859

但是我想根据重复行的数量选择要保留的行。我相信如果我从具有最高计数的分组项目中选择一行,我最有可能获得格式正确的地址。

在我的例子中使用“NOT IN(SELECT MAX”或“MIN”会在我的情况下留下错误的记录。

但是,当我使用GROUP BY获得最高计数时,我不能包含ID字段。

SELECT COUNT(ID), ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY FROM SUPPLIERS GROUP BY ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY ORDER BY COUNT(ID) DESC

这会得到结果:

Count(ID) | ACCNO  | ACCNAME    | ADDRESS1     | ADDRESS2    | City
2         | 1001   | JoeB Ltd   | 123 Street1  |             | London
1         | 1001   | Joe B Ltd  | 123 Street1  |             | London
1         | 1001   | JoeB Ltd   | 123 Street1  | London      | London
1         | 1001   | JoeB Ltd   | 129 Street9  |             | London

希望我有意义。我不知道如何从计数最高的组返回ID(任何)。有没有人知道我怎么做到这一点?

编辑:

我上面的示例将除ID以外的所有列分组并获取计数,第2行和第3行将组合在一起,组计数为2(其余的计数ID为1,因为它们都是唯一的)所以我想要保持第2行或第3行,无论哪一个都是相同的。

编辑2:

我认为这会起作用:

DELETE
FROM SUPPLIERS
WHERE ID NOT IN 
 (SELECT TOP 1 MAX(ID) FROM SUPPLIERS  
  Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY 
  ORDER BY COUNT(ID) DESC)

不幸的是,这会删除除一条记录以外的所有记录,其选择版本看起来很有希望:

SELECT *
FROM SUPPLIERS a
WHERE ID NOT IN 
 (SELECT TOP 1 MAX(ID) FROM SUPPLIERS b 
  WHERE a.ACCNO = b.ACCNO Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY 
  ORDER BY COUNT(ID) DESC)

答案

感谢用户1751825(标记为答案让我最接近最终结果)

DELETE FROM SUPPLIERS WHERE ID IN (SELECT ID
FROM SUPPLIERS a
 WHERE ID NOT IN 
  (SELECT TOP 1 MAX(ID) FROM SUPPLIERS b 
  WHERE a.ACCNO = b.ACCNO Group By ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY 
  ORDER BY COUNT(ID) DESC))

3 个答案:

答案 0 :(得分:1)

根据我的理解,在您提供的示例中,您希望保留记录ID = 5并删除其余记录。

WITH CTE AS(
   SELECT ID, ACCNO, ACCNAME, ADDRESS1, ADDRESS2, CITY,
       RN = ROW_NUMBER()OVER(PARTITION BY ACCNO ORDER BY ID DESC)
  FROM SUPPLIERS
)
DELETE FROM CTE WHERE RN > 1

这应该可以做到!

答案 1 :(得分:0)

我认为这应该做你需要的。

delete from SUPPLIERS
where ID NOT IN (
    Select max(ID)
    FROM SUPPLIERS
    Group by ACCNO
)

答案 2 :(得分:0)

这对我有用。

表格

ID   ACCNO       ACCNAME         ADDRESS1        ADDRESS2        City
1    1001        Joe B Ltd       123 Street1                     London
2    1001        JoeB Ltd        123 Street1                     London
3    1001        JoeB Ltd        123 Street1                     London
4    1001        JoeB Ltd        123 Street1     London          London
5    1001        JoeB Ltd        129 Street9                     London
6    67          Nise            Gata1      
7    67          Nisse           Gata2      
8    67          Nisse           Gata1           Haninge         Stockholm

<强> RESULT

ACCNO    ACCNAME         ADDRESS1        ADDRESS2        City
1001     JoeB Ltd        123 Street1     London          London
67       Nisse           Gata1           Haninge         Stockholm

<强>代码

select distinct 
    [ ACCNO  ],
    FIRST_VALUE([ ACCNAME    ]) OVER (PARTITION BY [ ACCNO  ] ORDER BY case when [ ACCNAME    ] is null then 1 else 0 end, rownumber  ) as [ ACCNAME    ],
    FIRST_VALUE([ ADDRESS1     ]) OVER (PARTITION BY [ ACCNO  ] ORDER BY case when [ ADDRESS1     ] is null then 1 else 0 end, rownumber  ) as [ ADDRESS1     ],
    FIRST_VALUE([ ADDRESS2    ]) OVER (PARTITION BY [ ACCNO  ] ORDER BY case when [ ADDRESS2    ] is null then 1 else 0 end, rownumber  ) as [ ADDRESS2    ],
    FIRST_VALUE([ City]) OVER (PARTITION BY [ ACCNO  ] ORDER BY case when [ City] is null then 1 else 0 end, rownumber  ) as [ City]
FROM 
(
    SELECT 
          [ ACCNO  ]
          ,[ ACCNAME    ]
          ,[ ADDRESS1     ]
          ,case when ltrim(rtrim([ ADDRESS2    ] )) = '' then null else [ ADDRESS2    ] end as [ ADDRESS2    ] -- spaces = NULL
          ,case when ltrim(rtrim([ City] )) = '' then null else [ City] end as [ City]

          ,count(*) as quantity
          ,ROW_NUMBER() OVER (
                                PARTITION BY [ ACCNO  ] 
                                ORDER BY 
                                    [ ACCNO  ], 
                                    count(*) desc
                             ) as rownumber
      FROM [dbo].[test_sql]  
      GROUP BY cube ( [ ACCNO  ]
          ,[ ACCNAME    ]
          ,[ ADDRESS1     ]
          ,[ ADDRESS2    ]
          ,[ City])
        HAVING [ ACCNO  ] is not null 
 ) myGroup