我有一个列cust_id, year_, month_, monthly_txn, monthly_bal
的表格。我需要
计算每个月的前三个月和前六个月avg(monthly_txn)
和variance(monthly_bal)
。我有一个查询,它返回上个月的最后三个月和六个月的平均值和方差,而不是每个月。我在Hive中不擅长分析功能。
SELECT cust_id, avg(monthly_txn)y,variance(monthly_bal)x, FROM (
SELECT cust_id, monthly_txn,monthly_bal,
row_number() over (partition by cust_id order by year_,month_ desc) r
from mytable) b WHERE r <= 3 GROUP BY cust_id
但我想要下面的内容。
输入:
cust_id year_ month_ monthly_txn monthly_bal
1 2018 1 456 8979289
1 2018 2 675 4567
1 2018 3 645 4890
1 2017 1 342 44522
1 2017 2 378 9898900
1 2017 2 456 234492358
1 2017 4 3535 789
1 2017 5 456 345
1 2017 6 598 334
期待输出:
假设对于txn,四分之一和半年的txn对于方差也是如此
cust_id year_ month_ monthly_txn monthly_bal q_avg_txn h_avg_txn
1 2018 1 456 8979289 avg(456,598,4561) avg(456,598,4561,3535,4536,378)
1 2018 2 675 4567 avg(675,456,598) avg(675,456,3535,4561,598,4536)
1 2018 3 645 4890 avg(645,675,645) avg(645,675,645,3535,4561,598)
1 2017 1 342 44522 avg(342) avg(342)
1 2017 2 378 9898900 avg(378,342) avg(378,342)
1 2017 3 4536 234492358 avg(4536,372,342) avg(4536,378,342)
1 2017 4 3535 789 avg(3535,4536,378) avg(3535,4536,378,342)
1 2017 5 4561 345 avg(4561,3535,4536) avg(4561,3535,4536,342,378)
1 2017 6 598 334 avg(598,4561,3535) avg(598,4561,3535,4536,342,378)
答案 0 :(得分:1)
使用unbounded preceding
分析函数(/ *获取季度和半年值),然后使用子查询来获得结果。
答案 1 :(得分:0)
如果您有每个感兴趣的月份的数据(即没有间隙),那么这应该有效:
select t.*,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as avg_3,
avg(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as avg_6,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 2 preceding and current row
) as variance_3,
variance(monthly_bal) over (partition by cust_id
order by year_, month_
rows between 5 preceding and current row
) as variance_6
from mytable t;