从最早的发票日期开始的12个月滚动数据 - Hadoop

时间:2017-04-25 06:11:08

标签: hadoop hive

寻求以下问题陈述的帮助。

I / P数据集:

customer id invoice date    item id invoice amount      Comment
1           10-Jan-2014     1       10                  Start of 12 month window - 10th Jan 2014 to 10th Jan 2015
1           20-Jan-2014     2       20                  Falls within 12 month window
1           21-Aug-2014     1       10                  Falls within 12 month window
1           31-Dec-2014     1       10                  Falls within 12 month window
1           20-Feb-2015     1       10                  Start of new 12 month window as this is post 10th Jan 2015
1           30-Mar-2016     1       10                  Start of new 12 month window as this is post 20th Feb 2016

所需的o / p
客户ID发票日期项目ID发票金额窗口总和(项目ID = 1的金额)

1           10-Jan-2014     1       10              1       10
1           20-Jan-2014     2       20              1       0
1           21-Aug-2014     1       10              1       20
1           31-Dec-2014     1       10              1       30
1           20-Feb-2015     1       10              2       10
1           30-Mar-2016     1       10              3       10

我尝试在Hive中使用以下查询来实现上述输出,但挑战是在我们超过12个月标记后重置下一个窗口。 (请参阅输入数据集中的第5行和第6行)。需要将这些记录视为新窗口的开始。

使用以下查询:

SELECT SUM(if(item_id = 1, invoice_amount, 0)) OVER (
    PARTITION BY customer_id 
    ORDER BY invoice_date ASC 
    RANGE BETWEEN 31556926 PRECEDING AND CURRENT ROW
) FROM INVOICE_DETAILS;`

0 个答案:

没有答案