根据条件筛选重复行

时间:2010-09-07 21:23:04

标签: sql-server

我想在条件上过滤重复的行,以便选择具有最小修改和最大活动和唯一rid和did的行。自我加入?或者任何更好的方法会更好地表现?

示例:

id        rid                  modified                 active         did
1             1             2010-09-07 11:37:44.850              1             1
2             1             2010-09-07 11:38:44.000              1             1
3             1             2010-09-07 11:39:44.000              1             1
4             1             2010-09-07 11:40:44.000              0             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

预期输出

1             1             2010-09-07 11:37:44.850              1             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

在评论第一个答案时,该建议不适用于以下数据集(当active = 0且modified是该行的最小值时)

 id        rid                     modified                      active           did
    1             1             2010-09-07 11:37:44.850              1             1
    2             1             2010-09-07 11:38:44.000              1             1
    3             1             2010-09-07 11:39:44.000              1             1
    4             1             2010-09-07 11:36:44.000              0             1
    5             2             2010-09-07 11:41:44.000              1             1
    6             1             2010-09-07 11:42:44.000              1             2

3 个答案:

答案 0 :(得分:2)

假设SQL Server 2005+。如果您想要返回关系,请使用RANK()代替ROW_NUMBER()

;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id

答案 1 :(得分:0)

选择id,rid,min(修改),max(active),从foo组中删除,按id进行排序;

答案 2 :(得分:0)

如果你有一个表对于rid和did的每个组合有一行,你可以通过CROSS APPLY获得良好的性能:

SELECT
   X.*
FROM
   ParentTable P
   CROSS APPLY (
      SELECT TOP 1 *
      FROM YourTable T
      WHERE P.rid = T.rid AND P.did = T.did
      ORDER BY active DESC, modified
   ) X

(SELECT DISTINCT rid, did FROM YourTable)替换为ParentTable会有效,但会影响效果。

此外,这是我疯狂的单扫描魔术查询,通常可以胜过其他方法:

SELECT
   id = Substring(Packed, 6, 4),
   rid,
   modified = Convert(datetime, Substring(Packed, 2, 4)),
   Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
   did,
FROM
   (
      SELECT
         rid,
         did,
         Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
      FROM
         YourTable
      GROUP BY
         rid,
         did
   ) X

不建议使用此方法,因为它不容易理解,并且很容易犯错误。但这很奇怪,因为在某些情况下它可以胜过其他方法。