我的输入文件是
2014-08-23 30000
2014-09-24 20000
2014-10-23 50000
2014-11-24 7000
我想要这样的输出
2014-08-23 30000
2014-09-24 -10000
2014-10-25 30000
2014-11-24 -47000
我想在没有udf的情况下实现这一目标 我试过这段代码
SELECT C.ID ,C.DATE,C.VALUE AS CURRENT_DATE_VALUE,COALESCE(CAST(O.VALUE AS INT),0) AS PREV_DATE_VALUE,(C.VALUE-COALESCE(CAST(O.VALUE as INT),0)) AS DIFF_VALUE
FROM ITEM O
LEFT OUTER JOIN
( SELECT T.ID ,C.DATE,C.VALUE,MAX(UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy')) AS PREV_DATE
FROM ITEM C
LEFT OUTER JOIN ITEM T ON(C.ID = T.ID) WHERE
UNIX_TIMESTAMP (C.DATE,'dd-MM-yyyy') > UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy') GROUP BY
T.ID ,C.DATE,C.VALUE) C
ON (O.ID = C.ID AND UNIX_TIMESTAMP (O.DATE,'dd-MM-yyyy') = C.PREV_DATE)
答案 0 :(得分:0)
如果您可以访问HIVE 0.13(具有windowing个功能),则可以使用最少量的代码执行此操作
<强>查询:强>
select date
,(value - prev) as diff
from (
select date
,value
,LAG(value, 1, 0) OVER (ORDER BY date) as prev
from some_database.some_table
) x
<强>输出:强>
2014-08-23 30000
2014-09-24 -10000
2014-10-23 30000
2014-11-24 -43000