我有这个视图,表示每个用户到表内系统的连接状态,如下所示:
---------------------------------------
|id | date | User | Connexion |
|1 | 01/01/2018 | A | 1 |
|2 | 02/01/2018 | A | 0 |
|3 | 03/01/2018 | A | 1 |
|4 | 04/01/2018 | A | 1 |
|5 | 05/01/2018 | A | 0 |
|6 | 06/01/2018 | A | 0 |
|7 | 07/01/2018 | A | 0 |
|8 | 08/01/2018 | A | 1 |
|9 | 09/01/2018 | A | 1 |
|10 | 10/01/2018 | A | 1 |
|11 | 11/01/2018 | A | 1 |
---------------------------------------
目标输出将是按日期获取成功和失败连接顺序的计数,因此输出将是这样
---------------------------------------------------------------
|StartDate EndDate User Connexion Length|
|01/01/2018 | 01/01/2018 | A | 1 | 1 |
|02/01/2018 | 02/01/2018 | A | 0 | 1 |
|03/01/2018 | 04/01/2018 | A | 1 | 2 |
|05/01/2018 | 07/01/2018 | A | 0 | 3 |
|08/01/2018 | 11/01/2018 | A | 1 | 4 |
---------------------------------------------------------------
答案 0 :(得分:3)
这就是所谓的“空缺与孤岛”问题。针对您的版本的最佳解决方案是行号的不同:
select user, min(date), max(date), connexion, count(*) as length
from (select t.*,
row_number() over (partition by user order by date) as seqnum,
row_number() over (partition by user, connexion order by date) as seqnum_uc
from t
) t
group by user, connexion, (seqnum - seqnum_uc);
为什么这样做有效,所以很难解释。通常,我发现,如果您盯着子查询的结果,就会发现所关注的组之间的差异是如何恒定的。
注意:列名不能使用user
或date
。这些是SQL中的关键字(一种或另一种类型)。如果确实使用它们,则必须使用转义字符使SQL混乱,这只会使代码更难编写,读取和调试。