Question

Please help me with this one, I'm stuck and cant figure out how to write my Query. I'm working with SQL Server 2014.

Table A (approx 65k ROWS) CEID = primary key

 CEID     State    Checksum
    1        2          666
    2        2          666
    3        2          666
    4        2          333
    5        2          333
    6        9          333
    7        9          111
    8        9          111
    9        9          741
   10        2          656

Desired output

 CEID     State    Checksum
    3        2          666
    6        9          333
    8        9          111
    9        9          741
   10        2          656

I want to keep the row with highest CEID if "state" is equal for all duplicate checksums. If state differs but Checksum is equal i want to keep the row with highest CEID for State=9. Unique rows like CEID 9 and 10 should be included in result regardless of State.

This join returns all duplicates:

SELECT a1.*, a2.*
FROM  tableA a1  
INNER JOIN tableA a2 ON a1.ChecksumI = a2.ChecksumI
                     AND a1.CEID <> a2.CEID

I've also identified MAX(CEID) for each duplicate checksum with this query

SELECT a.Checksum, a.State, MAX(a.CEID) CEID_MAX ,COUNT(*) cnt
FROM tableA a
GROUP BY a.Checksum, a.State
HAVING COUNT(*) > 1
ORDER BY a.Checksum, a.State

With the first query, I can't figure out how to SELECT the row with the highest CEID per Checksum.

The problem I encounter with last one is that GROUP BY isn't allowed in subqueries when I try to join on it.

Answer 1

您可以将row_number()与checksum进行分区，并按State desc和CEID desc进行排序。请注意，ORDER BY State desc, CEID desc

可能满足您的两个条件

并获取第一个row_number

;with 
cte as
(
    select  *, rn = row_number() over (Partition by Checksum order by State desc, CEID desc)
    from    TableA
)
select  *
from    cte
where   rn = 1
order by CEID;

Filter and keep most recent duplicate

1 个答案: