蜂巢:使用加入时如何获得最近3个月的总支出

时间:2019-03-06 01:44:47

标签: hive hiveql

如何使用将source1和source2合并然后获得目标表时如何获得最近3个月的总支出?

source1:

+--------+----------+
| cst_id |   date   |
+--------+----------+
| a      | 20180125 |
| b      | 20180627 |
| c      | 20181122 |
| d      | 20180304 |
+--------+----------+

source2: 联接source1和source2表

+--------+--------+-------+
| cst_id | month  | spend |
+--------+--------+-------+
| a      | 201710 |   6.2 |
| a      | 201711 |   0.5 |
| a      | 201712 |   4.3 |
| a      | 201801 |   6.5 |
| a      | 201802 |     7 |
| a      | 201803 |    11 |
| a      | 201804 |    23 |
| a      | 201805 |    67 |
| a      | 201806 |   8.1 |
| a      | 201807 |   0.2 |
| a      | 201808 |   9.1 |
| a      | 201809 |     1 |
| a      | 201810 |     5 |
| a      | 201811 |     6 |
| a      | 201812 |     9 |
| b      | 201710 |   6.2 |
| b      | 201711 |   0.5 |
| b      | 201712 |   4.3 |
| b      | 201801 |   6.5 |
| b      | 201802 |     7 |
| b      | 201803 |    11 |
| b      | 201804 |    23 |
| b      | 201805 |    67 |
| b      | 201806 |   8.1 |
| b      | 201807 |   0.2 |
| b      | 201808 |   9.1 |
| b      | 201809 |     1 |
| b      | 201810 |     5 |
| b      | 201811 |     6 |
| b      | 201812 |     9 |
+--------+--------+-------+

目标表: 最终,每个cst_id仅获得一行

+--------+----------+-----------------+
| cst_id |   date   | last3monthSpend |
+--------+----------+-----------------+
| a      | 20180125 |              11 |
| b      | 20180627 |             101 |
+--------+----------+-----------------+

1 个答案:

答案 0 :(得分:0)

您可以使用joingroup by和窗口函数来执行所需的操作。下面显示了逻辑:

select s1.cst_id, s1.date, sum(s1.spend)
from (select s1.*,
             row_number() over (partition by s2.cst_id order by s2.month desc) as seqnum
      from source1 s1 join 
           source2 s2
           on s2.cst_id = s1.cst_id and
              s2.month < s1.date
     ) s
where seqnum <= 3
group by s1.cst_id, s1.date;

唯一的问题是如何比较datemonth列。如果值是字符串,则此版本有效。