SQL:检查n个连续记录是否大于某个值

时间:2017-06-29 11:06:28

标签: sql apache-spark-sql gaps-and-islands

我有一个包含数字的表格。我必须找出是否存在n个连续数字大于某个阈值m的情况。 例如,

id      delta         
---------------
1        10  
4        15 
11       22 
23       23  
46       21
57       9

所以在这里,如果我想知道是否有3个连续记录的值超过20,那么我应该得到True。当我检查连续4个记录时,这是假的。那可能吗?这是在Apache Spark SQL上。感谢。

1 个答案:

答案 0 :(得分:1)

您可以使用滞后来执行此操作:

select t.*
from (select t.*,
             lag(val, 1) over (order by id) as val_1,
             lag(val, 2) over (order by id) as val_2
      from t
     ) t
where val > 20 and val_1 > 20 and val_2 > 20;

这将返回第一行,它们是每个三部分的一部分。如果你只想要真/假:

select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
             lag(val, 1) over (order by id) as val_1,
             lag(val, 2) over (order by id) as val_2
      from t
     ) t
where val > 20 and val_1 > 20 and val_2 > 20;

编辑:

我错过了不想超过3的部分。所以,你可以加强这个:

select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
             lag(val, 1) over (order by id) as val_1,
             lag(val, 2) over (order by id) as val_2,
             lag(val, 3) over (order by id) as val_3,
             lead(val, 1) over (order by id) as val_next_1
      from t
     ) t
where (val_3 <= 20 or val_3 is null) and
      (val_2 > 20 and val_1 > 20 and val > 20) and
      (val_next_1 <= 20 or val_next_1 is null);

这有点棘手,因为值可以在行的开头或结尾。