Postgresql GROUP BY没有跳行?

时间:2015-02-17 10:58:07

标签: sql postgresql group-by

假设我在表格中有这些数据:

 id | thing | operation | timestamp
----+-------+-----------+-----------
  0 | foo   |       add |         0
  0 | bar   |       add |         1
  1 | baz   |    remove |         2
  1 | dim   |       add |         3
  0 | foo   |    remove |         4
  0 | dim   |       add |         5

有没有办法构建一个Postgres SQL查询,它将按ID和操作进行分组,但是没有将具有更高时间戳值的行分组给那些具有更低时间戳的行?我想从查询中得到这个:

 id |  things  | operation
----+----------+-----------
  0 | foo, bar |       add
  1 |      baz |    remove
  1 |      dim |       add
  0 |      foo |    remove
  0 |      dim |       add

基本上分组,但仅限于按时间戳排序的相邻行。

3 个答案:

答案 0 :(得分:7)

这是一个gaps and islands问题(虽然本文针对的是SQL-Server,但它很好地描述了问题所以仍然适用于Postgresql),并且可以使用排名函数来解决:

SELECT  id,
        thing,
        operation,
        timestamp,
        ROW_NUMBER() OVER(ORDER BY timestamp) - 
                ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet,
        ROW_NUMBER() OVER(ORDER BY timestamp) AS PositionInSet,
        ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS PositionInGroup
FROM    T
ORDER BY timestamp;

正如您所看到的那样,通过获取集合中的整体位置,并扣除组中的位置,您可以识别岛屿,(id, operation, groupingset)的每个唯一组合代表岛屿:

id  thing   operation   timestamp   groupingSet PositionInSet   PositionInGroup
0   foo     add         0           0           1               1
0   bar     add         1           0           2               2           
1   baz     remove      2           2           3               1
1   dim     add         3           3           4               1
0   foo     remove      4           4           5               1
0   dim     add         5           3           6               3

然后你只需要将它放在子查询中,并按相关字段分组,并使用string_agg连接你的东西:

SELECT  id, STRING_AGG(thing) AS things, operation
FROM    (   SELECT  id,
                    thing,
                    operation,
                    timestamp,
                    ROW_NUMBER() OVER(ORDER BY timestamp) - 
                            ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet
            FROM    T
        ) AS t
GROUP BY id, operation, groupingset;

答案 1 :(得分:0)

如果您的样本数据足够好,也许这样可行:

select id, string_agg(thing,',') as things, operation
from tablename
group by id, operation

即。使用id和operation来查找要连接的东西。

编辑,现在使用string_agg而不是group_concat。

答案 2 :(得分:0)

您可以按ID结果计算组中的不同操作,并使用此计数器将union 2选择到表:

WITH cnt AS (
  SELECT id, operations_cnt FROM (
    SELECT id, array_length(array_agg(DISTINCT operation),1) AS operations_cnt
    FROM test GROUP BY id
  ) AS t
  WHERE operations_cnt=1
)
SELECT id, string_agg(things, ','), operation, MAX(timestamp) AS timestamp
FROM test
WHERE id IN (SELECT id FROM cnt) GROUP BY id, operation
UNION ALL
SELECT id, things, operation, timestamp
FROM test
WHERE id NOT IN (SELECT id FROM cnt)
ORDER BY timestamp;

结果:

 id | string_agg | operation | timestamp 
----+------------+-----------+-----------
  0 | foo,bar    | add       |         1
  1 | baz        | remove    |         2
  1 | dim        | add       |         3
  2 | foo        | remove    |         4
  2 | dim        | add       |         5
(5 rows)