在Hive / SQL中使用滞后功能时列中的默认值

时间:2018-09-12 20:55:31

标签: sql hive hiveql

我在Hive中有一个如下表。

我想为seconds相同的列计算id中的时差,并在time_diff列中获取值。

Table

+-----+---------+------------------------+
| id  |  event  |            eventdate   |
+-----+---------+------------------------+
| 1   | sent    | 2017-11-23 03:49:50.0  |
| 1   | sent    | 2017-11-23 03:49:59.0  |
| 2   | sent    | 2017-11-23 04:49:59.0  |
| 1   | click   | 2017-11-24 03:49:50.0  |
+-----+---------+------------------------+

我已经完成了以下操作

SELECT *, coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)),0) time_diff FROM Table;

Result

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 9         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

我得到了想要的东西,但有一个例外。在id列中1eventsenttime_diff的结果中,有两个值09 。在应用滞后函数后,我希望所有sent事件在0列中都有time_diff

Expected result

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 0         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

如何获得预期的结果?

1 个答案:

答案 0 :(得分:1)

您可以使用case表达式:

SELECT *,
       (case when event = 'sent' then 0
             else coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)), 0)
        end) as time_diff 
FROM Table;