create table users (id, created_date, ...)
insert into users (1, '2014-01-01 05.00.00')
insert into users (2, '2014-01-01 05.00.01')
insert into users (3, '2014-01-01 05.00.10')
insert into users (4, '2014-01-01 05.00.11')
insert into users (5, '2014-01-01 05.00.20')
insert into users (6, '2014-01-01 05.00.30')
insert into users (7, '2014-01-02 05.00.01')
insert into users (8, '2014-01-02 05.00.02')
insert into users (9, '2014-01-02 05.00.03')
insert into users (10, '2014-01-02 05.00.03')
insert into users (11, '2014-01-02 06.00.03')
insert into users (12, '2014-01-02 07.00.03')
如何查询在一个小窗口中创建的用户(比如整个表中的一秒钟)。在上面的场景中,我们可以将其分组如下
[1,2], [3,4], [7,8,9]
答案 0 :(得分:2)
我会将此查询与变量一起使用:
SELECT
GROUP_CONCAT(id)
FROM (
SELECT
id,
created_date,
@grp:= CASE WHEN created_date>@last_dt + INTERVAL 1 SECOND
THEN @grp+1
ELSE @grp END grp,
@last_dt := created_date
FROM
users, (SELECT @grp := 1, @last_dt := NULL) r
ORDER BY
created_date
) s
GROUP BY
grp
HAVING
COUNT(*)>1
请参阅小提琴here。它将返回:
[1,2], [3,4], [7,8,9,10]
我认为这是正确的,这正是你要找的。子查询将通过crerated_date对表进行排序,并将分配组中的每一行,每当前一个值之间的增量超过一秒时递增它的数字:
| ID | CREATED_DATE | GRP | @LAST_DT := CREATED_DATE |
|----|--------------------------------|-----|--------------------------|
| 1 | January, 01 2014 05:00:00+0000 | 1 | 2014-01-01 05:00:00 |
| 2 | January, 01 2014 05:00:01+0000 | 1 | 2014-01-01 05:00:01 |
| 3 | January, 01 2014 05:00:10+0000 | 2 | 2014-01-01 05:00:10 |
| 4 | January, 01 2014 05:00:11+0000 | 2 | 2014-01-01 05:00:11 |
| 5 | January, 01 2014 05:00:20+0000 | 3 | 2014-01-01 05:00:20 |
...
然后我使用GRP将此结果分组,使用GROUP_CONCAT并返回所有包含多行的组。
答案 1 :(得分:1)
您还可以考虑像这样进行自我加入:
SELECT
u1.id id1,
u1.created_date created_date1,
u2.id id2,
u2.created_date created_date2
FROM
users u1
JOIN users u2
ON u1.id < u2.id
AND u1.created_date
BETWEEN u2.created_date - INTERVAL 1 SECOND
AND u2.created_date
它不会以您要求的格式准确地提供结果。相反,它会为您提供基本上是一组图形边,然后您可以从中找到客户端代码中的connected components。
例如,对您的测试数据运行上述查询会给出...
ID1 CREATED_DATE1 ID2 CREATED_DATE2
1 January, 01 2014 05:00:00+0000 2 January, 01 2014 05:00:01+0000
3 January, 01 2014 05:00:10+0000 4 January, 01 2014 05:00:11+0000
7 January, 02 2014 05:00:01+0000 8 January, 02 2014 05:00:02+0000
8 January, 02 2014 05:00:02+0000 9 January, 02 2014 05:00:03+0000
8 January, 02 2014 05:00:02+0000 10 January, 02 2014 05:00:03+0000
9 January, 02 2014 05:00:03+0000 10 January, 02 2014 05:00:03+0000
...包含3个连接组件:
{
1 January, 01 2014 05:00:00+0000
2 January, 01 2014 05:00:01+0000
}
{
3 January, 01 2014 05:00:10+0000
4 January, 01 2014 05:00:11+0000
}
{
7 January, 02 2014 05:00:01+0000
8 January, 02 2014 05:00:02+0000
9 January, 02 2014 05:00:03+0000
10 January, 02 2014 05:00:03+0000
}
答案 2 :(得分:0)
您的示例表明您尝试使用&#34;执行某种cluster analysis,比之前的记录少了一秒#34;作为决定功能。我建议您反转问题,并在数据中查找超过一秒的间隔。 SO上已well covered。间隙的下限和上限分别是相邻簇的上限和下限。具有一个成员以及第一个和最后一个群集的群集将是您必须决定如何处理的特殊情况。