根据列值差异对数据进行分组

时间:2018-02-23 12:11:01

标签: sql postgresql

我有一个表 user_work_details ,其中有两列:USER_ID,START_TIME

START_TIME以毫秒(纪元)为单位,因此在实际时间写下

  USER_ID     START_TIME 
-----------------------------
    1         1518210035904        Feb 9,  2018 9:00:35 PM
    1         1518307236904        Feb 9,  2018 9:00:35 PM
    1         1519048475905        Feb 19, 2018 1:54:35 PM
    2         1518400835906        Feb 12, 2018 2:00:35 AM
    2         1518400837906        Feb 9,  2018 9:00:37 AM
    3         1518494435907        Feb 13, 2018 4:00:35 AM

我需要根据START_TIME值的差异对记录进行分组。所有记录将根据5分钟差异进行分组。所以,输出应该是:

  USER_ID     START_TIME         DIFF
--------------------------------------
    1         1518210035904      0
    1         1518307236904      0
    1         1519048475905      1
    2         1518400835906      2
    2         1518400837906      2
    3         1518494435907      3

如果USER_ID相同或两次之间的差异小于5分钟,则DIFF将具有相同的值。此外,每次更改都需要增加DIFF。

我使用像这样的LAG()尝试了上述内容:

SELECT
"USER_ID",
"START_TIME",
CASE WHEN "START_TIME" - LAG("START_TIME", 1, "START_TIME") OVER 
(PARTITION BY "USER_ID" ORDER BY "START_TIME") > 60000 
THEN 1  
ELSE 0 
END AS DIFF
FROM "user_work_details"
order by "USER_ID", "START_TIME"

此查询返回以下输出:

  USER_ID     START_TIME         DIFF
--------------------------------------
    1         1518210035904      0
    1         1518307236904      1
    1         1519048475905      1
    2         1518400835906      0
    2         1518400837906      1
    3         1518494435907      1

我只需要在更改时增加DIFF,某种手动计数器增量。我该怎么办?

编辑:输出值已修复,错误值较早

1 个答案:

答案 0 :(得分:0)

您可以使用lag()定义组的开始时间,然后使用累计和来分配差异:

SELECT uwd.*, SUM(flag) OVER (PARTITION BY user_id ORDER BY start_time) as diff
FROM (SELECT uwd.*
             ((START_TIME - LAG(START_TIME, 1, START_TIME) OVER (PARTITION BY USER_ID ORDER BY START_TIME) > 60000)::int) as flag
      FROM user_work_details uwd
     ) uwd;

我建议你定义没有双引号的列。必须引用名称只会使查询更难写和阅读。