CREATE TABLE big_hive_table(
`partner` string,
start_date date,
end_date date,
`category` string,
`category2` string);
insert into big_hive_table values ('S1','2018-01-01','2018-03-31','c1','M');
insert into big_hive_table values ('S1','2017-12-01','2018-01-31','c1','M');
insert into big_hive_table values ('S1','2017-01-01','2017-11-30','c1','M');
insert into big_hive_table values ('S1','2018-02-01','2018-04-30','c1','M');
insert into big_hive_table values ('S1','2018-02-01','2018-04-30','c1','L');
insert into big_hive_table values ('S2','2018-02-01','2018-04-30','c1','S');
insert into big_hive_table values ('S3','2018-02-01','2018-04-30','c2','S');
insert into big_hive_table values ('S3','2018-01-01','2018-03-31','c2','S');
insert into big_hive_table values ('S3','2017-12-01','2018-01-31','c2','S');
问题:如果有重叠的时间段,请获取组(合作伙伴category
,category2
)的最旧开始日期和最新结束日期
expected result:
S1 01/12/2017 30/04/2018 c1 M
S1 01/01/2017 30/11/2017 c1 M
S1 01/02/2018 30/04/2018 c1 L
S2 01/02/2018 30/04/2018 c1 S
S3 01/12/2017 30/04/2018 c2 S
我的查询
SELECT DISTINCT partner,
category,
category2,
First_value(start_date) OVER (partition BY partner, category, category2 ORDER BY start_date) period_start,
last_value(end_date) OVER (partition BY partner, category, category2 ORDER BY start_date rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) period_end
from (select pps.*, sum(start_new_period) over (partition BY partner, category, category2)
FROM ( select partner,
start_date,
end_date,
category,
category2,
lag(end_date) over (partition by partner, category, category2 order by start_date) previous_period_end
, case
when start_date > lag(end_date) over (partition by partner, category, category2 order by start_date)
then 1
else 0
end start_new_period
from big_hive_table
where start_date is not null and end_date is not null) pps
)
当我运行2个内部查询(来自select pps。*)或整个查询时,当前出现以下错误:
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException:Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression
谁能建议我所缺少的。感谢您的帮助。
答案 0 :(得分:0)
只需在 first_value 窗口函数中添加 rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following
,然后尝试再次运行。
更改查询
来自
First_value(start_date) OVER (partition BY partner, category, category2
ORDER BY start_date) period_start
收件人
First_value(start_date) OVER (partition BY partner, category, category2 ORDER BY
start_date rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) period_start
有Jira关于原始类型支持的信息,在Hive.2.1.0中已得到解决