我对Impala / hive查询非常陌生,我不太清楚如何制作这个。
此查询的目标是获取定义范围的数据(完成条件的2个点)。
为了更清楚,我们有一个包含3列的表:Date,A和B.
我们按日期对表格进行排序,我们希望从两个A = 1之间的所有区间中获取所有行,其中没有任何B = 1。 (因此,范围介于每两个A = 1之间,条件是它们中没有B = 1)。
我描绘了我正在寻找的概念,因此它变得更加清晰。
链接:https://drive.google.com/open?id=0B_zAJFzI2slWQnRwN2gwWk9NSG8
答案 0 :(得分:0)
select dt,A,B
from (select dt,A,B
,max (case when A=1 then dt end) over p as p_A1_dt
,max (case when B=1 then dt end) over p as p_B1_dt
,min (case when A=1 then dt end) over f as f_A1_dt
,min (case when B=1 then dt end) over f as f_B1_dt
from mytable
window p as (order by dt rows between unbounded preceding and 1 preceding)
,f as (order by dt rows between 1 following and unbounded following)
) t
where ( p_A1_dt >= p_B1_dt
or ( p_A1_dt is not null
and p_B1_dt is null
)
)
and ( f_A1_dt <= f_B1_dt
or ( f_A1_dt is not null
and f_B1_dt is null
)
)
and coalesce(A,-1) <> 1
相同,但没有window
decleration
select dt,A,B
from (select dt,A,B
,max (case when A=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding) as p_A1_dt
,max (case when B=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding) as p_B1_dt
,min (case when A=1 then dt end) over (order by dt rows between 1 following and unbounded following) as f_A1_dt
,min (case when B=1 then dt end) over (order by dt rows between 1 following and unbounded following) as f_B1_dt
from mytable
) t
where ( p_A1_dt >= p_B1_dt
or ( p_A1_dt is not null
and p_B1_dt is null
)
)
and ( f_A1_dt <= f_B1_dt
or ( f_A1_dt is not null
and f_B1_dt is null
)
)
and coalesce(A,-1) <> 1