按SQL中的时间距离对事件进行分组

时间:2009-12-09 17:01:42

标签: sql sql-server

通常,我需要关联(分组)在相似时间段内创建的记录。如果它有帮助,可以将下面的示例视为没有sessionID的点击流数据,我需要构建这些会话。

我有以下数据集:

UserId INT,
EventId INT,
DateCreated DATETIME,
BlockId INT

假设以下数据:

{123, 111, '2009-12-01 9:15am', NULL}
{123, 222, '2009-12-01 9:20am', NULL}
{123, 333, '2009-12-01 9:25am', NULL}
{123, 444, '2009-12-03 2:30pm', NULL}
{123, 555, '2009-12-03 2:32pm', NULL}

我需要做的是将这些事件由用户划分为临时桶。有一个商业规则说什么> 30分钟应该是一个新的桶。在上面的例子中,事件111-333表示一个块,即不超过30分钟将它们分开。同样,事件444-555表示第二块。

我当前的解决方案使用游标并且速度非常慢(因此,我需要处理的数据量不可持续)。我可以发布代码但很简单。

有什么想法吗?

4 个答案:

答案 0 :(得分:1)

基于评论主题,

一个。存储桶由存储桶中的第一个记录定义,每个存储桶中的第一个记录定义为DateCreated在最新的早期DateCreated之后超过30分钟的任何行。 (紧接在前的记录)

B中。存储桶中的其余行都是在第一行上或之后具有DateCreated的行,其中DateCreated小于前一行之后的30分钟,并且不存在非限定的(或新的存储桶定义),自指定的Bucket-defined行以来的行。

英文:

选择那些记录的DateCreated,其中DateCreated在上一个DateCreated之后超过30分钟,并且您选择的聚合函数在表中所有其他记录中,其中DateCreated在该桶定义的日期之后创建,不到30分钟之后immedialte之前的DateCreated,并且在存储桶定义的DateCreated和这个存在大于30分钟的间隔之间没有记录。

在SQL中:

  Select Z.BucketDefinitionDate , Count(*) RowsInBucket
  From (Select Distinct DateCreated BucketDefinitionDate 
        From Table Ti
        Where DateCreated > DateAdd(minute, 30, 
            (Select Max(DateCreated) From Table
             Where DateCreated < Ti.DateCreated))) Z
    Join Table B 
       On B.DateCreated > Z.BucketDefinitionDate 
           And Not Exists
             (Select * From Table
              Where DateCreated Between Z.BucketDefinitionDate 
                                    And B.DateCreated 
                 And DateCreated > DateAdd(minute, 30,
                     (Select Max(DateCreated) From Table
                      Where DateCreated < B.DateCreated))) 
  Group By Z.BucketDefinitionDate 

答案 1 :(得分:1)

希望这会让你朝着正确的方向前进。如果您在SP中,那么对StartTimes和EndTimes使用表变量应该使查询更容易阅读和理解。这将为您的批次提供开始和结束时间,然后只需加入您的桌子就可以了。

;WITH StartTimes AS
(
SELECT DISTINCT
    T1.DateCreated AS StartTime
FROM
    My_Table T1
LEFT OUTER JOIN My_Table T2 ON
    T2.UserID = T1.UserID AND
    T2.EventID = T1.EventID AND
    T2.DateCreated >= DATEADD(mi, -30, T1.DateCreated) AND
    T2.DateCreated < T1.DateCreated
WHERE
    T2.UserID IS NULL
)
SELECT
    StartTimes.StartTime,
    EndTimes.EndTime
FROM
(
SELECT DISTINCT
    T3.DateCreated AS EndTime
FROM
    My_Table T3
LEFT OUTER JOIN My_Table T4 ON
    T4.UserID = T3.UserID AND
    T4.EventID = T3.EventID AND
    T4.DateCreated <= DATEADD(mi, 30, T3.DateCreated) AND
    T4.DateCreated > T3.DateCreated
WHERE
    T4.UserID IS NULL
) AS ET
INNER JOIN StartTimes ST ON
    ST.StartTime <= ET.EndTimes
LEFT OUTER JOIN StartTimes ST2 ON
    ST2.StartTime <= ET.EndTimes AND
    ST2.StartTime > ST.StartTime
WHERE
    ST2.StartTime IS NULL

答案 2 :(得分:0)

您可以尝试的是

DECLARE @TABLE TABLE(
        ID INT,
        EventID INT,
        DateCreated DATETIME
)

INSERT INTO @TABLE SELECT 123, 111, '2009-12-01 9:15am'
INSERT INTO @TABLE SELECT 123, 222, '2009-12-01 9:20am' 
INSERT INTO @TABLE SELECT 123, 333, '2009-12-01 9:25am'
INSERT INTO @TABLE SELECT 123, 444, '2009-12-03 2:30pm'
INSERT INTO @TABLE SELECT 123, 555, '2009-12-01 2:32pm'

SELECT  ID,
        DATEADD(dd, DATEDIFF(dd,0,DateCreated), 0) DayVal,
        DATEPART(hh, DateCreated) HourPart,
        FLOOR(DATEPART(mi, DateCreated) / 30.) MinBucket
FROM    @TABLE

现在您可以按DayVal,HourPart和MinBucket进行分组。

答案 3 :(得分:0)

我想我有东西给你。它不是像汤姆H发布的那样酷的单一查询,但似乎有效。它使用表变量作为工作表。

declare @table table(
    id int identity(1,1),
    userId int,
    eventId int,dateCreated datetime,
    bucket int
)

insert into @table select 123, 111, '2009-12-01 9:15am', 0
// etc... insert more rows - note that the 'bucket' field is set to 0

declare @next_bucket int
set @next_bucket = 1
update @table 
    set bucket = @next_bucket, @next_bucket = @next_bucket + 1
    from @table as [current]
    where datecreated > dateadd(mi, 30, (select datecreated from @table as previous where [current].id = previous.id + 1))

update @table 
    set bucket =
        coalesce(( select max(bucket)
                from @table as previous
                where previous.id < [current].id
                    and bucket <> 0
        ), 1)

    from @table as [current]
    where bucket = 0

-- return the results
select * from @table