寻求以下问题陈述的帮助。
I / P数据集:
customer id invoice date item id invoice amount Comment
1 10-Jan-2014 1 10 Start of 12 month window - 10th Jan 2014 to 10th Jan 2015
1 20-Jan-2014 2 20 Falls within 12 month window
1 21-Aug-2014 1 10 Falls within 12 month window
1 31-Dec-2014 1 10 Falls within 12 month window
1 20-Feb-2015 1 10 Start of new 12 month window as this is post 10th Jan 2015
1 30-Mar-2016 1 10 Start of new 12 month window as this is post 20th Feb 2016
所需的o / p
客户ID发票日期项目ID发票金额窗口总和(项目ID = 1的金额)
1 10-Jan-2014 1 10 1 10
1 20-Jan-2014 2 20 1 0
1 21-Aug-2014 1 10 1 20
1 31-Dec-2014 1 10 1 30
1 20-Feb-2015 1 10 2 10
1 30-Mar-2016 1 10 3 10
我尝试在Hive中使用以下查询来实现上述输出,但挑战是在我们超过12个月标记后重置下一个窗口。 (请参阅输入数据集中的第5行和第6行)。需要将这些记录视为新窗口的开始。
使用以下查询:
SELECT SUM(if(item_id = 1, invoice_amount, 0)) OVER (
PARTITION BY customer_id
ORDER BY invoice_date ASC
RANGE BETWEEN 31556926 PRECEDING AND CURRENT ROW
) FROM INVOICE_DETAILS;`