如何查询在短窗口中创建的记录

时间:2014-04-05 06:48:12

标签: mysql database-design

create table users (id, created_date, ...)


insert into users (1, '2014-01-01 05.00.00')
insert into users (2, '2014-01-01 05.00.01')
insert into users (3, '2014-01-01 05.00.10')
insert into users (4, '2014-01-01 05.00.11')
insert into users (5, '2014-01-01 05.00.20')
insert into users (6, '2014-01-01 05.00.30')
insert into users (7, '2014-01-02 05.00.01')
insert into users (8, '2014-01-02 05.00.02')
insert into users (9, '2014-01-02 05.00.03')
insert into users (10, '2014-01-02 05.00.03')
insert into users (11, '2014-01-02 06.00.03')
insert into users (12, '2014-01-02 07.00.03')

如何查询在一个小窗口中创建的用户(比如整个表中的一秒钟)。在上面的场景中,我们可以将其分组如下

[1,2], [3,4], [7,8,9]

3 个答案:

答案 0 :(得分:2)

我会将此查询与变量一起使用:

SELECT
  GROUP_CONCAT(id)
FROM (
  SELECT
    id,
    created_date,
    @grp:= CASE WHEN created_date>@last_dt + INTERVAL 1 SECOND
                THEN @grp+1
                ELSE @grp END grp,
    @last_dt := created_date
  FROM
    users, (SELECT @grp := 1, @last_dt := NULL) r
  ORDER BY
    created_date
  ) s
GROUP BY
  grp
HAVING
  COUNT(*)>1

请参阅小提琴here。它将返回:

[1,2], [3,4], [7,8,9,10]

我认为这是正确的,这正是你要找的。子查询将通过crerated_date对表进行排序,并将分配组中的每一行,每当前一个值之间的增量超过一秒时递增它的数字:

| ID |                   CREATED_DATE | GRP | @LAST_DT := CREATED_DATE |
|----|--------------------------------|-----|--------------------------|
|  1 | January, 01 2014 05:00:00+0000 |   1 |      2014-01-01 05:00:00 |
|  2 | January, 01 2014 05:00:01+0000 |   1 |      2014-01-01 05:00:01 |
|  3 | January, 01 2014 05:00:10+0000 |   2 |      2014-01-01 05:00:10 |
|  4 | January, 01 2014 05:00:11+0000 |   2 |      2014-01-01 05:00:11 |
|  5 | January, 01 2014 05:00:20+0000 |   3 |      2014-01-01 05:00:20 |
...

然后我使用GRP将此结果分组,使用GROUP_CONCAT并返回所有包含多行的组。

答案 1 :(得分:1)

您还可以考虑像这样进行自我加入:

SELECT
    u1.id id1,
    u1.created_date created_date1,
    u2.id id2,
    u2.created_date created_date2
FROM
    users u1
    JOIN users u2
        ON u1.id < u2.id
        AND u1.created_date
            BETWEEN u2.created_date - INTERVAL 1 SECOND
            AND u2.created_date

[SQL Fiddle]

它不会以您要求的格式准确地提供结果。相反,它会为您提供基本上是一组图形边,然后您可以从中找到客户端代码中的connected components

例如,对您的测试数据运行上述查询会给出...

ID1 CREATED_DATE1                   ID2 CREATED_DATE2
1   January, 01 2014 05:00:00+0000  2   January, 01 2014 05:00:01+0000
3   January, 01 2014 05:00:10+0000  4   January, 01 2014 05:00:11+0000
7   January, 02 2014 05:00:01+0000  8   January, 02 2014 05:00:02+0000
8   January, 02 2014 05:00:02+0000  9   January, 02 2014 05:00:03+0000
8   January, 02 2014 05:00:02+0000  10  January, 02 2014 05:00:03+0000
9   January, 02 2014 05:00:03+0000  10  January, 02 2014 05:00:03+0000

...包含3个连接组件:

{
    1   January, 01 2014 05:00:00+0000
    2   January, 01 2014 05:00:01+0000
}
{
    3   January, 01 2014 05:00:10+0000
    4   January, 01 2014 05:00:11+0000
}
{
    7   January, 02 2014 05:00:01+0000
    8   January, 02 2014 05:00:02+0000
    9   January, 02 2014 05:00:03+0000
    10  January, 02 2014 05:00:03+0000
}

答案 2 :(得分:0)

您的示例表明您尝试使用&#34;执行某种cluster analysis,比之前的记录少了一秒#34;作为决定功能。我建议您反转问题,并在数据中查找超过一秒的间隔。 SO上已well covered。间隙的下限和上限分别是相邻簇的上限和下限。具有一个成员以及第一个和最后一个群集的群集将是您必须决定如何处理的特殊情况。