在SparkSQL中,可以定义一个按列排序超过2的窗口查询,但似乎不可能根据这些列定义范围子句。
例如,
select
row_id,
count(*) over (
partition by group_id
order by filter_key1, filter_key2
range between 12 preceding and 12 following
range between 5 preceding and 1 preceding
) as the_count
from table
以上失败(虽然语法可能已关闭?手指交叉......)
可以在类似上述的单一陈述中完成吗?
答案 0 :(得分:0)
不,只允许一个范围。但不要绝望。 count(*)
是附加的:
select row_id,
(count(*) over (partition by group_id
order by filter_key1, filter_key2
range between 12 preceding and 12 following
) +
count(*) over (partition by group_id
order by filter_key1, filter_key2
range between 5 preceding and 1 preceding
)
) as the_count
from table
这个特殊的例子似乎很奇怪,因为范围是重叠的。也许这就是你的意图。
根据您的问题,我想知道您是否想要:
select row_id,
(count(*) over (partition by group_id
order by filter_key1
range between 12 preceding and 12 following
) +
count(*) over (partition by group_id
order by filter_key2
range between 5 preceding and 1 preceding
)
) as the_count
from table