我在Hive
中有一个如下表。
我想为seconds
相同的列计算id
中的时差,并在time_diff
列中获取值。
Table
+-----+---------+------------------------+
| id | event | eventdate |
+-----+---------+------------------------+
| 1 | sent | 2017-11-23 03:49:50.0 |
| 1 | sent | 2017-11-23 03:49:59.0 |
| 2 | sent | 2017-11-23 04:49:59.0 |
| 1 | click | 2017-11-24 03:49:50.0 |
+-----+---------+------------------------+
我已经完成了以下操作
SELECT *, coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)),0) time_diff FROM Table;
Result
+-----+---------+------------------------+-----------+
| id | event | eventdate |time_diff |
+-----+---------+------------------------+-----------+
| 1 | sent | 2017-11-23 03:49:50.0 | 0 |
| 1 | sent | 2017-11-23 03:49:59.0 | 9 |
| 2 | sent | 2017-11-23 04:49:59.0 | 0 |
| 1 | click | 2017-11-24 03:49:50.0 | 86391 |
+-----+---------+------------------------+-----------+
我得到了想要的东西,但有一个例外。在id
列中1
为event
和sent
为time_diff
的结果中,有两个值0
和9
。在应用滞后函数后,我希望所有sent
事件在0
列中都有time_diff
。
Expected result
:
+-----+---------+------------------------+-----------+
| id | event | eventdate |time_diff |
+-----+---------+------------------------+-----------+
| 1 | sent | 2017-11-23 03:49:50.0 | 0 |
| 1 | sent | 2017-11-23 03:49:59.0 | 0 |
| 2 | sent | 2017-11-23 04:49:59.0 | 0 |
| 1 | click | 2017-11-24 03:49:50.0 | 86391 |
+-----+---------+------------------------+-----------+
如何获得预期的结果?
答案 0 :(得分:1)
您可以使用case
表达式:
SELECT *,
(case when event = 'sent' then 0
else coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)), 0)
end) as time_diff
FROM Table;