GROUP BY相邻记录

时间:2017-11-08 03:05:53

标签: sql sqlite group-by gaps-and-islands

如何按SQLite中的相邻记录对列进行分组?

情况

MCVE对于12个表JOIN - ed SELECT查询(按多列分组)。

entity_log存储value(随着时间的推移; timest作为Unix时间戳记):

CREATE TABLE entity_log (
   id     INTEGER PRIMARY KEY,
   timest INTEGER,
   entity INTEGER /*REFERENCES entity_table(id)*/,
   value  INTEGER
);

INSERT INTO entity_log (timest, entity, value) VALUES (1510160703, 0, 0);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160704, 0, 0);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160705, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160706, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160707, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160708, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160709, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160710, 0, 1);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160711, 0, 0);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160712, 0, 0);
INSERT INTO entity_log (timest, entity, value) VALUES (1510160713, 0, 0);

查询

按时间顺序排列的value次发生,汇总到min(timest)max(timest)

SELECT
   min(timest) AS timest_first,
   max(timest) AS timest_last,
   value
FROM
   entity_log
WHERE
   entity = 0
GROUP BY
   value
ORDER BY
   timest_last DESC
;

结果

如果某个value重复出现(但不相邻; 0,1,0而不是0,0,1),则汇总的timest - 范围会重叠:

timest_first  timest_last  value
........03    ........13   0
........05    ........10   1

目的

按时间顺序相邻的记录进一步分组value

timest_first  timest_last  value
........11    ........13   0
........05    ........10   1
........03    ........04   0

2 个答案:

答案 0 :(得分:2)

如果我理解正确:你想要的是一个取决于下一条记录中的值的结果。如果value的值不同,则会启动一个新的子组。我们可以找到每一行的下一组的第一行。为此,我们可以使用基于不平等的自动连接。当然,您的数据的最后一行将会丢失,因为它们没有下一行具有不同的value。 (也许你可以通过使用UNION添加一个带有将来日期和不存在value的假行来解决这个问题。)

然后,从我们知道每个成员的下一个组的开始日期的数据列表中,我们可以使用Nextdate进行分组,这样我们就可以找到该组中的第一个和最后一个日期:< / p>

SELECT Min(Somedate) AS timest_first, Max(Somedate) AS timest_last, value FROM
(SELECT  t2.Value, t2.timest AS Somedate, Min(t1.timest) AS Nextdate, t1.value as n
 FROM entity_log t1 JOIN entity_log t2
 ON t1.timest > t2.timest
 WHERE t1.value <> t2.value
 GROUP BY t2.timest) s1
GROUP BY value, Nextdate
ORDER BY 2 desc 

答案 1 :(得分:0)

完整query包括UNION具有唯一虚拟记录(因此也返回最后一组)和(占位符)其他列:

SELECT
    min(timest_curr) AS timest_first,
    max(timest_curr) AS timest_last,
    value/*,
    value2,
    value3                                additional columns */
FROM (
    SELECT
        t2.value, /*
        t2.value2,
        t2.value3,                        additional columns */
        t2.timest      AS timest_curr,
        min(t1.timest) AS timest_next,
        t1.value       AS n/*,
        t1.value2      AS n2,
        t1.value3      AS n3              additional columns */
    FROM (
        SELECT
            timest,
            value/*,
            value2,
            value3                        additional columns */
        FROM
            entity_log
        WHERE
            entity = 0
        UNION
        SELECT
            strftime('%s', 'now') + 1, /* or maximum value for Unix time timestamp: 2147483647 */
            'unique_value'/*,             TEXT to INTEGER -comparison simplifies unique value problem (NULL does not work)
            'unique_value2',
            'unique_value3'               additional columns */
    ) AS t1
    JOIN
        entity_log AS t2
    ON
        t1.timest > t2.timest
    WHERE
        t1.value <> t2.value/*
        OR
        t1.value2 <> t2.value2
        OR
        t1.value3 <> t2.value3            additional columns */
    GROUP BY
        t2.timest
) AS s1
GROUP BY
    value, /*
    value2,
    value3,                               additional columns */
    timest_next
ORDER BY
    timest_last DESC
;