HiveQL-根据条件计算两行之间的时差

时间:2020-07-05 12:53:30

标签: sql time hive timestamp hiveql

我想像这样计算每个ID的时差:Time_difference1是status = 4-status = 2时的时间戳差异,而Time_difference2是status = 3-status = 2时的时间戳差异。

我的桌子看起来像这样

id  status  timestamp
16  1       12.45.12
16  2       12.45.30
16  3       12.45.55
16  4       12.46.15
11  1       12.45.46
11  2       12.45.55
11  3       12.46.11
11  4       12.46.34
27  1       12.48.01
27  2       12.48.18
27  3       12.48.42
27  4       12.48.52

因此结果应如下所示:

id  timediff1   timediff2
16  0.00.45     0.00.25
11  0.00.25     0.00.16
27  0.00.41     0.00.24

我尝试过类似的解决方案

SELECT id,
   status
   timestamp,
   (to_unix_timestamp(case1) - to_unix_timestamp(timestamp)) AS timediff1
FROM (
  SELECT t.*,
         CASE WHEN status=4 THEN timestamp END OVER (PARTITION BY id ORDER BY timestamp ASC) AS case1
  FROM table t 
)
WHERE status = 2

但是它不起作用。 OVER PARTITION BY部分给出错误:预期的输入“ FROM”不匹配;第5行pos 0

任何人都知道如何进行吗?

1 个答案:

答案 0 :(得分:1)

我想像这样计算每个ID的时差:Time_difference1是status = 4-status = 2时的时间戳差异,而Time_difference2是status = 3-status = 2时的时间戳差异。

使用条件聚合:

SELECT id,
       (max(to_unix_timestamp(case when status = 4 then timestamp end)) - 
        max(to_unix_timestamp(case when status = 2 then timestamp end))
       ) AS timediff1,
       (max(to_unix_timestamp(case when status = 3 then timestamp end)) - 
        max(to_unix_timestamp(case when status = 2 then timestamp end)
       ) AS timediff2)
FROM t 
GROUP BY id