如何在复杂的SQL查询中删除某些重复项

时间:2016-08-23 09:11:24

标签: sql sql-server tsql greatest-n-per-group

我正在编写查询并需要它删除a.GenUserID的所有重复项,但也保留最近的登录日期(即b.LogDateTime),但此日期必须早于6个月。如果有以后的日期,则必须将其删除。 我希望这是有道理的。

SELECT DISTINCT 
    a.GenUserID, 
    c.DeletionDate, 
    b.LogDateTime,
    (CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end)
FROM RioReport.dbo.GenUser a 
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
    AND c.DeletionDate IS NOT NULL
    AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL))
ORDER BY a.GenUserID, b.LogDateTime desc

3 个答案:

答案 0 :(得分:1)

使用cte和window函数

  ;with ctr as (
    select a.GenUserID, a.DeletionDate, a.LogDateTime 
    row_number()over(partition by a.GenUserID order by b.LogDateTime desc) rnk

    from RioReport.dbo.GenUser a )
    select a.GenUserID, a.DeletionDate, a.LogDateTime,
        CASE  WHEN DATEDIFF(mm,LogDateTime,getdate())<6 THEN 'NO' else 'YES - ARCHIVED' end)
     from ctr a where a.rnk=1

答案 1 :(得分:1)

您可以将row_number()信息添加到查询中,并将该查询包装到外部查询中,该查询仅从该结果中获取编号为1的记录:

select      *
from        (
    select      a.GenUserID, 
                c.DeletionDate, 
                b.LogDateTime,
                case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as diabled,
                row_number() over (partition by a.GenUserID
                                   order by     b.LogDateTime desc) as rn
    from        RioReport.dbo.GenUser a 
    inner join  dbo.GenUserArchive c
            on  a.GenUserID = c.GenUserID
    left join   dbo.GenUserAccessHistory b
            on  a.GenUserID = b.ExtraInfo
    where       (a.Disabled=0 or c.Disabled=0)
    and         c.DeletionDate is not null
    and         (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
    )
where       rn = 1            
order by    a.GenUserID

请注意,您可以将第一个left join转换为inner join而不对结果集进行任何更改,因为您对其中一个字段进行了非空检查。 inner join是首选,可能会带来性能提升。

如果GenUserAccessHistory.LogDateTime始终为非null,则可以通过将or b.LogDateTime is null条件移至相应的join DateAdd(MM, -6, GetDate()) > b.LogDateTime子句来避免测试on

生成的行号将按降序LogDateTime值的顺序给出,并从每个不同用户的1重新开始。

没有窗口功能的替代

自SQL Server 2008以来,支持

row_number()和其他窗口函数。在编写的注释中,您无法使用它。如果是这种情况,可以使用公用表表达式(自SQL Server 2005以来支持):

;with cte as (
    select      a.GenUserID, 
                c.DeletionDate, 
                b.LogDateTime,
                case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as disabled,
    from        RioReport.dbo.GenUser a 
    inner join  dbo.GenUserArchive c
            on  a.GenUserID = c.GenUserID
    left join   dbo.GenUserAccessHistory b
            on  a.GenUserID = b.ExtraInfo
    where       (a.Disabled=0 or c.Disabled=0)
    and         c.DeletionDate is not null
    and         (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
    )    
select      *
from        cte main
where       LogDateTime is null
        or  not exists (select   1
                        from     cte sub
                        where    sub.GenUserID = main.GenUserID
                        and      sub.LogDateTime > main.LogDateTime)
order by    GenUserID

答案 2 :(得分:1)

尝试使用以下查询。

;WITH CTE_Group
AS(
SELECT 
    ROW_NUMBER() OVER (PARTITION BY a.GenUserID ORDER BY b.LogDateTime DESC) as RNO, 
    a.GenUserID, 
    c.DeletionDate, 
    b.LogDateTime,
    (CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end) IsArchived
FROM RioReport.dbo.GenUser a 
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
    AND c.DeletionDate IS NOT NULL
    AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL)))
    SELECT  GenUserID, 
            DeletionDate, 
            LogDateTime,
            IsArchived
    FROM WITH_CTE_Group
    WHERE RNO=1