让我们看看您是否可以提供解决方案,因为到目前为止我还找不到任何解决方案:
我有这张桌子:
Hour w1 w2 w3 w4 w5
8:05 1 0 0 0 0
8:10 1 0 0 0 0
8:15 0 1 0 0 1
8:20 0 1 0 1 1
8:25 0 0 1 1 1
8:30 0 0 1 1 1
8:35 0 1 1 1 0
8:40 1 1 1 1 0
8:45 0 0 1 0 1
我想要的是将群集识别为事件。如果在任何时候形成一个事件,则将创建一个事件。如果前一个值是1或右边的变量在前一行中有一个值,它将传播。
在最后一个表之后,事件的定位如下:
Hour w1 w2 w3 w4 w5
8:05 EVENT 1 0 0 0 0
8:10 EVENT 1 0 0 0 0
8:15 0 EVENT 2 0 0 EVENT 3
8:20 0 EVENT 2 0 EVENT 3 EVENT 3
8:25 0 0 EVENT 3 EVENT 3 EVENT 3
8:30 0 0 EVENT 3 EVENT 3 EVENT 3
8:35 0 EVENT 3 EVENT 3 EVENT 3 0
8:40 EVENT 3 EVENT 3 EVENT 3 EVENT 3 0
8:45 0 0 EVENT 3 0 EVENT 4
谢谢!如果有任何疑问,请告诉我。
PS: 可以通过以下方式“垂直”而不是水平地看到第一个表格:
Hour Position Value
8:05 w1 1
8:05 w2 0
8:05 w3 0
8:05 w4 0
8:05 w5 0
8:10 w1 1
8:10 w2 0
... ... ...
答案 0 :(得分:1)
我将这描述为一个棘手的问题-很有趣。我很确定最通用的解决方案需要递归CTE。但是,这确实非常麻烦且昂贵-从根本上讲,这是一个迭代过程。
有一些假设,可以仅使用复杂的查询而不是递归CTE来解决。主要假设是相邻的“ 1”的垂直字符串不会被右边的新组“打断”。
以下代码使用字符串而非数字对簇进行编码。它不能完全产生您想要的输出,但是可以识别群集:
with t as (
select *
from (values ('8:05', 1, 0, 0, 0, 0),
('8:10', 1, 0, 0, 0, 0),
('8:15', 0, 1, 0, 0, 1),
('8:20', 0, 1, 0, 1, 1),
('8:25', 0, 0, 1, 1, 1),
('8:30', 0, 0, 1, 1, 1),
('8:35', 0, 1, 1, 1, 0),
('8:40', 1, 1, 1, 1, 0),
('8:45', 0, 0, 1, 0, 1)
) v(hour, w1, w2, w3, w4, w5)
),
t5 as (
select t.*,
(case when w1 = 1 then 'w1_' || sum(case when w1 = 0 then 1 else 0 end) over (order by hour) end) as w1_grp,
(case when w2 = 1 then 'w2_' || sum(case when w2 = 0 then 1 else 0 end) over (order by hour) end) as w2_grp,
(case when w3 = 1 then 'w3_' || sum(case when w3 = 0 then 1 else 0 end) over (order by hour) end) as w3_grp,
(case when w4 = 1 then 'w4_' || sum(case when w4 = 0 then 1 else 0 end) over (order by hour) end) as w4_grp,
(case when w5 = 1 then 'w5_' || sum(case when w5 = 0 then 1 else 0 end) over (order by hour) end) as w5_grp_final
from t
),
t4 as (
select t5.*,
(case when w4 = 1 then greatest(w4_grp, max(prev_w5_grp_final) over (partition by w4_grp)) end) as w4_grp_final
from (select t5.*, lag(w5_grp_final) over (order by hour) as prev_w5_grp_final
from t5
) t5
),
t3 as (
select t4.*,
(case when w3 = 1 then greatest(w3_grp, max(prev_w4_grp_final) over (partition by w3_grp)) end) as w3_grp_final
from (select t4.*, lag(w4_grp_final) over (order by hour) as prev_w4_grp_final
from t4
) t4
),
t2 as (
select t3.*,
(case when w2 = 1 then greatest(w2_grp, max(prev_w3_grp_final) over (partition by w2_grp)) end) as w2_grp_final
from (select t3.*, lag(w3_grp_final) over (order by hour) as prev_w3_grp_final
from t3
) t3
),
t1 as (
select t2.*,
(case when w1 = 1 then greatest(w1_grp, max(prev_w2_grp_final) over (partition by w1_grp)) end) as w1_grp_final
from (select t2.*, lag(w2_grp_final) over (order by hour) as prev_w2_grp_final
from t2
) t2
)
select hour, w1_grp_final, w2_grp_final, w3_grp_final, w4_grp_final, w5_grp_final
from t1
order by hour asc;
基本思想很简单。它在最右边的列中标识群集,然后使用规则将这些簇向左传播,一次一列。
Here是db <>小提琴。
Here是SQL Server的db <>小提琴。