Row_Number 按 id order 按时间戳分区,但按连续行分组

时间:2021-07-25 13:04:47

标签: sql row-number

样本输入

Name | ID    | Timestamp
-----|-------|-----------------
ABI  | 1     | 2016-01-01 02:00
ABI  | 1     | 2016-01-01 03:00
ABI  | 2     | 2016-01-01 04:00
ABI  | 1     | 2016-01-01 05:00
ABI  | 3     | 2016-01-01 06:00
ABI  | 3     | 2016-01-01 07:00
ABI  | 3     | 2016-01-01 08:00
ABI  | 3     | 2016-01-01 09:00

期望输出

Name | ID    | Timestamp       |Rank
-----|-------|-----------------|-----
ABI  | 1     | 2016-01-01 02:00|1
ABI  | 1     | 2016-01-01 03:00|2
ABI  | 2     | 2016-01-01 04:00|1
ABI  | 1     | 2016-01-01 05:00|1
ABI  | 1     | 2016-01-01 06:00|2
ABI  | 3     | 2016-01-01 07:00|1
ABI  | 3     | 2016-01-01 08:00|2
ABI  | 3     | 2016-01-01 09:00|3

尝试查询

我尝试使用 ROW_NUMBER() 和 PARTITION BY 按名称和 ID 进行排名,但按按时间戳排序的连续行对其进行分组。

我试过这个

SELECT Name,
       ID,
       TIMESTAMP(_timestamp) AS TimeStamp,
       ROW_NUMBER() OVER(PARTITION BY Name,ID ORDER BY _timestamp DESC) RANK               
FROM Table_ID

但它按名称和 ID 排名,而不按连续行分组

非常感谢您的关注和参与。

3 个答案:

答案 0 :(得分:0)

我假设您使用的是 MySql。

您可以使用 LAG()SUM() 窗口函数创建连续的组,然后为每个组创建 ROW_NUMBER()

SELECT Name, ID, Timestamp,
       ROW_NUMBER() OVER (PARTITION BY Name, grp ORDER BY Timestamp) `Rank`
FROM (
  SELECT *, SUM(flag) OVER (PARTITION BY Name ORDER BY Timestamp) grp
  FROM (
    SELECT *, COALESCE(ID <> LAG(ID) OVER (PARTITION BY Name ORDER BY Timestamp), 1) flag
    FROM Table_ID
  ) t  
) t 

参见demo

答案 1 :(得分:0)

这是一种间隙和孤岛问题。最简单的解决方案可能是使用行号的差异来识别组,然后 row_number() 用于最终输出:

select t.*,
       row_number() over (partition by name, id, seqnum - seqnum_2 order by timestamp) as rank
from (select t.*,
             row_number() over (partition by name order by timestamp) as seqnum,
             row_number() over (partition by name, id order by timestamp) as seqnum_2
      from t
     ) t;

答案 2 :(得分:0)

使用 MariaDB 10.5.0:

WITH edges AS (
        SELECT t.*
             , COALESCE(name <> LAG(name) OVER (ORDER BY ts) OR id <> LAG(id) OVER (ORDER BY ts), 1) AS edge
          FROM runs AS t
     )
   , grps AS (
        SELECT t.*
             , SUM(edge) OVER (ORDER BY ts) AS grp
          FROM edges AS t
     )
SELECT grps.*
     , ROW_NUMBER() OVER (PARTITION BY grp ORDER BY ts) AS actual
  FROM grps
;

+------+------+---------------------+----------+------+------+--------+
| name | id   | ts                  | expected | edge | grp  | actual |
+------+------+---------------------+----------+------+------+--------+
| ABI  |    1 | 2016-01-01 02:00:00 |        1 |    1 |    1 |      1 |
| ABI  |    1 | 2016-01-01 03:00:00 |        2 |    0 |    1 |      2 |
| ABI  |    2 | 2016-01-01 04:00:00 |        1 |    1 |    2 |      1 |
| ABI  |    1 | 2016-01-01 05:00:00 |        1 |    1 |    3 |      1 |
| ABI  |    1 | 2016-01-01 06:00:00 |        2 |    0 |    3 |      2 |
| ABI  |    3 | 2016-01-01 07:00:00 |        1 |    1 |    4 |      1 |
| ABI  |    3 | 2016-01-01 08:00:00 |        2 |    0 |    4 |      2 |
| ABI  |    3 | 2016-01-01 09:00:00 |        3 |    0 |    4 |      3 |
+------+------+---------------------+----------+------+------+--------+