SQL和时态数据

时间:2016-01-14 15:32:49

标签: sql sql-server sql-server-2008

给出一个约会表,如下:

User     Start                    End
UserA    2016-01-15 12:00:00      2016-01-15 14:00:00    
UserA    2016-01-15 15:00:00      2016-01-15 17:00:00    
UserB    2016-01-15 13:00:00      2016-01-15 15:00:00    
UserB    2016-01-15 13:32:00      2016-01-15 15:00:00    
UserB    2016-01-15 15:30:00      2016-01-15 15:30:00    
UserB    2016-01-15 15:45:00      2016-01-15 16:00:00    
UserB    2016-01-15 17:30:00      2016-01-15 18:00:00    

我想创建一个不同时间间隔的列表,其中相同数量的人有约会:

Start                 End                   Count
2016-01-15 12:00:00   2016-01-15 13:00:00   1
2016-01-15 13:00:00   2016-01-15 14:00:00   2
2016-01-15 14:00:00   2016-01-15 15:45:00   1
2016-01-15 15:45:00   2016-01-15 16:00:00   2
2016-01-15 16:00:00   2016-01-15 17:00:00   1
2016-01-15 17:00:00   2016-01-15 17:30:00   0
2016-01-15 17:30:00   2016-01-15 18:00:00   1

我如何在SQL中执行此操作,最好是SQL Server 2008?

编辑:澄清:手动,通过为每个用户创建一行,标记阻塞时间,然后总结具有标记的行数来获得结果:

Time  12  13  14  15  16  17
UserA xxxxxxxx    xxxxxxxx
UserB     xxxxxxxx   x      xx
Count 1   2   1      21   0 1

该结果集将从可用的最短时间开始,以可用的最大时间结束,而ASCII艺术只有15分钟的分辨率,我至少需要分辨率。我想你可以留下行" 0"结果,如果这对你来说更容易。

2 个答案:

答案 0 :(得分:4)

必须有一种比这更简单的方法,但至少你可以单独遵循每一步:

declare @t table ([User] varchar(19) not null,Start datetime2 not null,[End] datetime2 not null)
insert into @t([User], Start, [End]) values
('UserA','2016-01-15T12:00:00','2016-01-15T14:00:00'),
('UserA','2016-01-15T15:00:00','2016-01-15T17:00:00'),
('UserB','2016-01-15T13:00:00','2016-01-15T15:00:00'),
('UserB','2016-01-15T13:32:00','2016-01-15T15:00:00'),
('UserB','2016-01-15T15:30:00','2016-01-15T15:30:00'),
('UserB','2016-01-15T15:45:00','2016-01-15T16:00:00'),
('UserB','2016-01-15T17:30:00','2016-01-15T18:00:00')

;With Times as (
    select Start as Point from @t
    union
    select [End] from @t
), Ordered as (
    select Point,ROW_NUMBER() OVER (ORDER BY Point) as rn
    from Times
), Periods as (
    select
        o1.Point as Start,
        o2.Point as [End]
    from
        Ordered o1
            inner join
        Ordered o2
            on
                o1.rn = o2.rn - 1
), UserCounts as (
select p.Start,p.[End],COUNT(distinct [User]) as Cnt,ROW_NUMBER() OVER (Order BY p.[Start]) as rn
from
    Periods p
        left join
    @t t
        on
            p.Start < t.[End] and
            t.Start < p.[End]
group by
    p.Start,p.[End]
), Consolidated as (
    select uc.*
    from
        UserCounts uc
            left join
        UserCounts uc_anti
            on
                uc.rn = uc_anti.rn + 1 and
                uc.Cnt = uc_anti.Cnt
    where
        uc_anti.Cnt is null
    union all
    select c.Start,uc.[End],c.Cnt,uc.rn
    from
        Consolidated c
            inner join
        UserCounts uc
            on
                c.Cnt = uc.Cnt and
                c.[End] = uc.Start
)
select
    Start,MAX([End]) as [End],Cnt
from
    Consolidated
group by
    Start,Cnt
order by Start

CTE是 - Times - 因为任何给定的开始或结束标记可以在最终结果中开始或结束一段时间,我们只需将它们全部放在一列中 - 所以{{1}可以对它们进行编号,以便Ordered可以将它们重新组合到每个可能的最小周期内。

然后

Periods返回原始数据,找出每个计算周期重叠的用户数。

UserCounts是最棘手的CTE,但它基本上是在用户数量相等的情况下合并彼此相邻的时段。

结果:

Consolidated

(我甚至得到零排,我不确定我能不能存在)

答案 1 :(得分:0)

如果您有一个calendar表格,这种查询会更容易编写。但是在这个例子中,我使用recursive CTE动态构建了一个。 CTE返回约会块,然后我们可以将其加入约会数据。我无法确定样本数据中的间隔模式,因此我以一小时的块显示结果。您可以修改此部分,也可以在第二个表中定义自己的部分。

示例数据

/* Table variables make sharing data easier    
 */
DECLARE @Sample TABLE
    (
        [User]      VARCHAR(50),
        [Start]     DATETIME,
        [End]       DATETIME
    )
;

INSERT INTO @Sample
    (
        [User],
        [Start],
        [End]
    )
VALUES
    ('UserA', '2016-01-15 12:00:00', '2016-01-15 14:00:00'),   
    ('UserA', '2016-01-15 15:00:00', '2016-01-15 17:00:00'),   
    ('UserB', '2016-01-15 13:00:00', '2016-01-15 15:00:00'),
    ('UserB', '2016-01-15 13:32:00', '2016-01-15 15:00:00'), 
    ('UserB', '2016-01-15 15:30:00', '2016-01-15 15:30:00'),
    ('UserB', '2016-01-15 15:45:00', '2016-01-15 16:00:00'),
    ('UserB', '2016-01-15 17:30:00', '2016-01-15 18:00:00')
;

我使用了两个变量来将返回的结果限制为只包含在给定起点和终点内的约会。

/* Set an start and end point for the next query    
 */
DECLARE @Start  DATETIME = '2016-01-15 12:00:00';
DECLARE @End    DATETIME = '2016-01-15 18:00:00';

WITH Calendar AS
    (
            /* Anchor returns start of first appointment    
             */
            SELECT
                @Start                                          AS [Start],
                DATEADD(SECOND, -1, DATEADD(HOUR, 1, @Start))   AS [End] 

        UNION ALL

            /* Recursion, keep adding new records until end of last appointment    
             */
            SELECT
                DATEADD(HOUR, 1, [Start])   AS [Start],
                DATEADD(HOUR, 1, [End])     AS [End]
            FROM
                Calendar
            WHERE
                [End] <= @End
    )
SELECT
    c    [Start],
    c    [End],
    COUNT(DISTINCT s    [User]) AS [Count]
FROM
    Calendar AS c
        LEFT OUTER JOIN @Sample AS s            ON s    [Start] BETWEEN c    [Start] AND c    [End]
                                                OR s    [End] BETWEEN c    [Start] AND c    [End]
GROUP BY
    c    [Start],
    c    [End]
;

由于约会可能超过一小时,因此可能会导致超过一行。这解释了为什么7个样本行导致返回总数为9。