我正在从电表中获取一些数据。 示例:-
Date KWH
2018-12-01 50
2018-12-02 90
2018-12-03 150
我想通过Pig Code提取KWH的实际值。
预期:-
Date KWH
2018-12-02 40
2018-12-03 60
答案 0 :(得分:0)
在hadoop中引用以前的记录非常困难,因为我们将输入拆分并分配给不同的任务。我认为以下方法可行,但效率低下(与按顺序读取数据的单个进程相比)。
A = LOAD 'test.txt' AS (a1:chararray, a2:int);
B = FOREACH A GENERATE ToDate(a1, 'y-M-d', 'UTC') as date, a2;
C = FOREACH B GENERATE AddDuration(date, 'P1D') as nextdate, -a2 as a2;
D = join B by date, C by nextdate;
E = FOREACH D GENERATE B::date as date, B::a2 + C::a2 as value;
dump E;