如果标签<> 0,如何计算运行总和,如果HIVE中tag = 0,则重置为0?

时间:2017-02-03 08:41:06

标签: hive conditional lag running-total

customer    txn_date    tag running_sum
A           1-Jan-17    1   1
A           2-Jan-17    1   2
A           3-Jan-17    1   3
A           4-Jan-17    1   4
A           5-Jan-17    1   5
A           6-Jan-17    1   6
A           7-Jan-17    0   0
A           8-Jan-17    1   1
A           9-Jan-17    1   2
A           10-Jan-17   1   3
A           11-Jan-17   0   0
A           12-Jan-17   0   0
A           13-Jan-17   1   1
A           14-Jan-17   1   2
A           15-Jan-17   0   0

如果tag = 0,如何获取running_sum并将running_sum重置为零?就像上面的示例一样。 TIA

1 个答案:

答案 0 :(得分:1)

您需要做的是创建"群组"对于1和0的每个部分。您可以通过创建布尔标志然后对该列进行累积求和来获取组来完成此操作。从那里,您可以在子查询中创建的每个组累计总结原始tag列。

<强>查询

SELECT customer
  , txn_date
  , tag
  , SUM(tag) OVER (PARTITION BY customer, flg_sum ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
  SELECT *
    , SUM(tag_flg) OVER (PARTITION BY customer ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
  FROM (
    SELECT *
      , CASE WHEN  tag = 1 THEN 0 ELSE 1 END AS tag_flg
    FROM database.table ) x ) y

<强>输出

customer        txn_date        tag     running_sum
A               2017-01-01      1       1
A               2017-01-02      1       2
A               2017-01-03      1       3
A               2017-01-04      1       4
A               2017-01-05      1       5
A               2017-01-06      1       6
A               2017-01-07      0       0
A               2017-01-08      1       1
A               2017-01-09      1       2
A               2017-01-10      1       3
A               2017-01-11      0       0
A               2017-01-12      0       0
A               2017-01-13      1       1
A               2017-01-14      1       2
A               2017-01-15      0       0