如何在SQL中的多米诺骨牌效应中对事件进行聚类?

时间:2018-12-05 21:46:21

标签: sql sql-server

让我们看看您是否可以提供解决方案,因为到目前为止我还找不到任何解决方案:

我有这张桌子:

Hour    w1  w2  w3  w4  w5
8:05    1   0   0   0   0
8:10    1   0   0   0   0
8:15    0   1   0   0   1
8:20    0   1   0   1   1
8:25    0   0   1   1   1
8:30    0   0   1   1   1
8:35    0   1   1   1   0
8:40    1   1   1   1   0
8:45    0   0   1   0   1

我想要的是将群集识别为事件。如果在任何时候形成一个事件,则将创建一个事件。如果前一个值是1或右边的变量在前一行中有一个值,它将传播。

在最后一个表之后,事件的定位如下:

Hour      w1          w2          w3        w4       w5
8:05    EVENT 1       0           0         0        0
8:10    EVENT 1       0           0         0        0
8:15      0         EVENT 2       0         0       EVENT 3
8:20      0         EVENT 2       0       EVENT 3   EVENT 3
8:25      0           0         EVENT 3   EVENT 3   EVENT 3
8:30      0           0         EVENT 3   EVENT 3   EVENT 3
8:35      0         EVENT 3     EVENT 3   EVENT 3    0
8:40    EVENT 3     EVENT 3     EVENT 3   EVENT 3    0
8:45      0           0         EVENT 3     0       EVENT 4

谢谢!如果有任何疑问,请告诉我。

PS: 可以通过以下方式“垂直”而不是水平地看到第一个表格:

Hour    Position    Value
8:05      w1          1
8:05      w2          0
8:05      w3          0
8:05      w4          0
8:05      w5          0
8:10      w1          1
8:10      w2          0
...       ...         ...

1 个答案:

答案 0 :(得分:1)

我将这描述为一个棘手的问题-很有趣。我很确定最通用的解决方案需要递归CTE。但是,这确实非常麻烦且昂贵-从根本上讲,这是一个迭代过程。

有一些假设,可以仅使用复杂的查询而不是递归CTE来解决。主要假设是相邻的“ 1”的垂直字符串不会被右边的新组“打断”。

以下代码使用字符串而非数字对簇进行编码。它不能完全产生您想要的输出,但是可以识别群集:

with t as (
      select *
      from (values ('8:05', 1,   0,   0,  0,   0),
                   ('8:10', 1,   0,   0,   0,   0),
                   ('8:15', 0,   1,   0,   0,   1),
                   ('8:20', 0,   1,   0,   1,   1),
                   ('8:25', 0,   0,   1,   1,   1),
                   ('8:30', 0,   0,   1,   1,   1),
                   ('8:35', 0,   1,   1,   1,   0),
                   ('8:40', 1,   1,   1,   1,   0),
                   ('8:45', 0,   0,   1,   0,   1)
           ) v(hour, w1, w2, w3, w4, w5) 
     ),
     t5 as (
      select t.*,
             (case when w1 = 1 then 'w1_' || sum(case when w1 = 0 then 1 else 0 end) over (order by hour) end) as w1_grp,
             (case when w2 = 1 then 'w2_' || sum(case when w2 = 0 then 1 else 0 end) over (order by hour) end) as w2_grp,
             (case when w3 = 1 then 'w3_' || sum(case when w3 = 0 then 1 else 0 end) over (order by hour) end) as w3_grp,
             (case when w4 = 1 then 'w4_' || sum(case when w4 = 0 then 1 else 0 end) over (order by hour) end) as w4_grp,
             (case when w5 = 1 then 'w5_' || sum(case when w5 = 0 then 1 else 0 end) over (order by hour) end) as w5_grp_final
      from t
     ),
     t4 as (
      select t5.*,
             (case when w4 = 1 then greatest(w4_grp, max(prev_w5_grp_final) over (partition by w4_grp)) end) as w4_grp_final
      from (select t5.*, lag(w5_grp_final) over (order by hour) as prev_w5_grp_final
            from t5
           ) t5
     ),
     t3 as (
      select t4.*,
             (case when w3 = 1 then greatest(w3_grp, max(prev_w4_grp_final) over (partition by w3_grp)) end) as w3_grp_final
      from (select t4.*, lag(w4_grp_final) over (order by hour) as prev_w4_grp_final
            from t4
           ) t4
     ),
     t2 as (
      select t3.*,
             (case when w2 = 1 then greatest(w2_grp, max(prev_w3_grp_final) over (partition by w2_grp)) end) as w2_grp_final
      from (select t3.*, lag(w3_grp_final) over (order by hour) as prev_w3_grp_final
            from t3
           ) t3
     ),
     t1 as (
      select t2.*,
             (case when w1 = 1 then greatest(w1_grp, max(prev_w2_grp_final) over (partition by w1_grp)) end) as w1_grp_final
      from (select t2.*, lag(w2_grp_final) over (order by hour) as prev_w2_grp_final
            from t2
           ) t2
     )
select hour, w1_grp_final, w2_grp_final, w3_grp_final, w4_grp_final, w5_grp_final
from t1
order by hour asc;

基本思想很简单。它在最右边的列中标识群集,然后使用规则将这些簇向左传播,一次一列。

Here是db <>小提琴。

Here是SQL Server的db <>小提琴。