如何对时差超过一小时的记录进行分组?

时间:2012-04-28 14:26:43

标签: sql sql-server group-by

我是这个网站的新手,但请耐心等待。

我正在尝试使用SQL Server GROUP BY一些数据。

以下是数据:

Computer    VisitDate
ComputerA   2012-04-28 09:00:00
ComputerA   2012-04-28 09:05:00
ComputerA   2012-04-28 09:10:00
ComputerB   2012-04-28 09:30:00
ComputerB   2012-04-28 09:32:00
ComputerB   2012-04-28 09:44:00
ComputerB   2012-04-28 09:56:00
ComputerB   2012-04-28 10:25:00
ComputerA   2012-04-28 12:25:00
ComputerC   2012-04-28 12:30:00
ComputerC   2012-04-28 12:35:00
ComputerC   2012-04-28 12:45:00
ComputerC   2012-04-28 12:55:00

我想要实现的是按计算机对数据进行分组,但如果计算机的访问时间超过1小时,则还要分组。以下是我正在尝试做的结果:

Computer     VisitDate
ComputerA    2012-04-28 09:00:00
ComputerB    2012-04-28 09:30:00
ComputerA    2012-04-28 12:25:00
ComputerC    2012-04-28 12:30:00

所以Computer A显示两次,因为它在09:10:00访问,然后在12:25:00再次访问,这意味着差异超过1小时。

很容易'GROUP BY Computer'但另一方面,我不知道从哪里开始。对此问题的任何帮助将不胜感激。

4 个答案:

答案 0 :(得分:3)

使用简单的GROUP BY无法做到这一点。此运算符仅适用于单列 - 例如您可以按计算机名称或其他内容进行分组,但是您无法添加其他逻辑,例如时间上的差异必须大于一小时或类似的任何内容。

你能做什么 - 只要你使用的是SQL Server 2005 或更新版(你没有在你的问题中提及版本)就可以使用CTE(通用表格式)。这些提供了一种切割数据的方法。

在这里,我正在做几件事 - 首先我按“ComputerName分区”数据并按VisitDate排序并使用ROW_NUMBER()获取每个分区的序号。然后,第二个CTE确定每台计算机的“第一”条目 - 行号= 1 - 并且第三个最终确定每个条目的VisitDate的差异,与行号= 1的条目相比。从第三个CTE开始,我最终选择那些行号= 1(每个“分区”的第一个)的条目,或者分钟数为60或更多的任何条目。

以下是代码:

;WITH Computers AS
(
    SELECT
        ComputerName, VisitDate,
        RN = ROW_NUMBER() OVER(PARTITION BY ComputerName ORDER BY VisitDate)
    FROM    
        dbo.YourComputerTable
),
FirstComputers AS
(
    SELECT ComputerName, VisitDate
    FROM Computers
    WHERE RN = 1
),
SelectedComputers AS
(
    SELECT 
        c.ComputerName, c.VisitDate, c.RN,
        DiffToFirst = ABS(DATEDIFF(MINUTE, c.VisitDate, fc.VisitDate))
    FROM Computers c
    INNER JOIN FirstComputers fc ON c.ComputerName = fc.ComputerName
)
SELECT * 
FROM SelectedComputers
WHERE RN = 1 OR DiffToFirst >= 60

答案 1 :(得分:2)

如果您已升级到SQL Server 2012,则可以使用LAG。

with Lagged as (
  select
    Computer,
    VisitDate,
    LAG(VisitDate,1) over (
      partition by Computer
      order by VisitDate
    ) as LastVisit
  from @Visit
)
  select
    Computer,
    VisitDate
  from Lagged
  where LastVisit is null
  or VisitDate > dateadd(hour,1,LastVisit);

SQL小提琴here

答案 2 :(得分:1)

此解决方案基于递归CTE。 您可能会找到online demo here

WITH CteBase
AS
(
        SELECT  v.Computer,
                v.VisitDate,
                ROW_NUMBER() OVER(PARTITION BY v.Computer ORDER BY v.VisitDate) AS RowNum
        FROM    @Visit v
),  CteRecursive
AS
(
        SELECT  crt.Computer,
                crt.VisitDate,
                crt.VisitDate AS GroupStartVisitDate,
                crt.RowNum,
                1 AS ComputerVisitRowNum
        FROM    CteBase crt
        WHERE   crt.RowNum = 1
        UNION ALL
        SELECT  crt.Computer,
                crt.VisitDate,
                CASE 
                    WHEN DATEDIFF(MINUTE, prv.GroupStartVisitDate, crt.VisitDate) <= 60 THEN prv.GroupStartVisitDate 
                    ELSE crt.VisitDate 
                END,
                crt.RowNum,
                CASE 
                    WHEN DATEDIFF(MINUTE, prv.GroupStartVisitDate, crt.VisitDate) <= 60 THEN prv.ComputerVisitRowNum + 1
                    ELSE 1
                END             
        FROM    CteBase crt
        INNER JOIN CteRecursive prv ON crt.Computer = prv.Computer 
        AND     crt.RowNum = prv.RowNum + 1
)
SELECT  r.Computer,
        r.GroupStartVisitDate
FROM    CteRecursive r
WHERE   r.ComputerVisitRowNum = 1;

结果:

Computer             GroupStartVisitDate
-------------------- -----------------------
ComputerA            2012-04-28 09:00:00.000
ComputerB            2012-04-28 09:30:00.000
ComputerC            2012-04-28 12:30:00.000
ComputerA            2012-04-28 12:25:00.000

如果您有任何疑问,请随时提出。

答案 3 :(得分:0)

CTE显示所有至少有一次访问的计算机,或间隙前后的访问&gt; 60分钟。

create table compVisits (Computer varchar(20), VisitDate datetime)
go
insert into compVisits values
('ComputerA', '2012-04-28 09:00:00')
, ('ComputerA', '2012-04-28 09:05:00')
, ('ComputerA', '2012-04-28 09:10:00')
, ('ComputerB', '2012-04-28 09:30:00')
, ('ComputerB', '2012-04-28 09:32:00')
, ('ComputerB', '2012-04-28 09:44:00')
, ('ComputerB', '2012-04-28 09:56:00')
, ('ComputerB', '2012-04-28 10:25:00')
, ('ComputerA', '2012-04-28 12:25:00')
, ('ComputerC', '2012-04-28 12:30:00')
, ('ComputerC', '2012-04-28 12:35:00')
, ('ComputerC', '2012-04-28 12:45:00')
, ('ComputerC', '2012-04-28 12:55:00')

; WITH a as ( --Initial row count
    select *, r=ROW_NUMBER()OVER(PARTITION BY Computer ORDER BY VisitDate)
    FROM compVisits
)
, b as ( -- gaps >60 minutes
    SELECT a1.Computer, a1.VisitDate
    FROM a a1
    INNER JOIN a a2 ON a1.Computer=a2.Computer AND (a1.r+1)=a2.r
    AND DATEDIFF(MINUTE,a1.VisitDate,a2.VisitDate)>60
    UNION
    SELECT a2.Computer, a2.VisitDate
    FROM a a1
    INNER JOIN a a2 ON a1.Computer=a2.Computer AND (a1.r+1)=a2.r
    AND DATEDIFF(MINUTE,a1.VisitDate,a2.VisitDate)>60
)
-- at least one visit
SELECT a1.Computer, a1.VisitDate
FROM a a1
WHERE r=1
AND NOT EXISTS(SELECT 1 FROM b WHERE Computer=a1.Computer)

UNION

-- gaps >60 minutes
SELECT * FROM b
ORDER BY VisitDate

结果:

enter image description here