Apache PIG - 将当前行的日期设置为下一个记录日期

时间:2016-08-25 19:59:47

标签: apache-pig

在Pig中,我要求将avail_until设置为给定特定id的下一个记录'avail_since,并将其默认为给定id的最后一个记录的9999-12-31。我首先按ID排序数据,然后是Avail_Since,但之后就被卡住了。我想我可能需要过度/缝合/超前/滞后功能但不确定。任何帮助将不胜感激!

输入数据:

ID       AVAIL_SINCE    AVAIL_UNTIL
1        19-Jan-00      31-Dec-99
1        11-Jun-00      31-Dec-99
1        4-Aug-00       31-Dec-99
1        19-May-01      31-Dec-99 
2        5-May-02       31-Dec-99 
2        8-Apr-03       31-Dec-99 
3        10-Jun-00      31-Dec-99 
3        31-Oct-00      31-Dec-99 
3        29-Dec-00      31-Dec-99  

必填结果:

ID       AVAIL_SINCE    AVAIL_UNTIL
1        19-Jan-00      11-Jun-00
1        11-Jun-00      4-Aug-00
1        4-Aug-00       19-May-01
1        19-May-01      31-Dec-99
2        5-May-02       8-Apr-03 
2        8-Apr-03       31-Dec-99
3        10-Jun-00      31-Oct-00
3        31-Oct-00      29-Dec-00
3        29-Dec-00      31-Dec-99

1 个答案:

答案 0 :(得分:0)

    --load file    
    A = load 'pdemo/sample1'
    using PigStorage(',')
    as(id:int,date1:chararray,date2:chararray);

    --Generate alternate record except last row.
    B = RANK A; 
    C = foreach B generate rank_A-1,date1,date2;
    J = join B by rank_A,C by $0;
    result = foreach J generate B::id as ID,B::date1 as AVAIL_SINCE,C::date1 as AVAIL_UNTIL;


    --Extract the last row
    grp = group B all;
    maxr = foreach grp generate MAX(B.rank_A);
    ij1 = join B by rank_A,maxr by $0;
    last_row = foreach ij1 generate B::id as ID,B::date1 as AVAIL_SINCE,B::date2 as AVAIL_UNTIL;

    Final_result = union result,last_row;

希望这会有所帮助!!!