在Pig中,我要求将avail_until设置为给定特定id的下一个记录'avail_since,并将其默认为给定id的最后一个记录的9999-12-31。我首先按ID排序数据,然后是Avail_Since,但之后就被卡住了。我想我可能需要过度/缝合/超前/滞后功能但不确定。任何帮助将不胜感激!
输入数据:
ID AVAIL_SINCE AVAIL_UNTIL
1 19-Jan-00 31-Dec-99
1 11-Jun-00 31-Dec-99
1 4-Aug-00 31-Dec-99
1 19-May-01 31-Dec-99
2 5-May-02 31-Dec-99
2 8-Apr-03 31-Dec-99
3 10-Jun-00 31-Dec-99
3 31-Oct-00 31-Dec-99
3 29-Dec-00 31-Dec-99
必填结果:
ID AVAIL_SINCE AVAIL_UNTIL
1 19-Jan-00 11-Jun-00
1 11-Jun-00 4-Aug-00
1 4-Aug-00 19-May-01
1 19-May-01 31-Dec-99
2 5-May-02 8-Apr-03
2 8-Apr-03 31-Dec-99
3 10-Jun-00 31-Oct-00
3 31-Oct-00 29-Dec-00
3 29-Dec-00 31-Dec-99
答案 0 :(得分:0)
--load file
A = load 'pdemo/sample1'
using PigStorage(',')
as(id:int,date1:chararray,date2:chararray);
--Generate alternate record except last row.
B = RANK A;
C = foreach B generate rank_A-1,date1,date2;
J = join B by rank_A,C by $0;
result = foreach J generate B::id as ID,B::date1 as AVAIL_SINCE,C::date1 as AVAIL_UNTIL;
--Extract the last row
grp = group B all;
maxr = foreach grp generate MAX(B.rank_A);
ij1 = join B by rank_A,maxr by $0;
last_row = foreach ij1 generate B::id as ID,B::date1 as AVAIL_SINCE,B::date2 as AVAIL_UNTIL;
Final_result = union result,last_row;
希望这会有所帮助!!!