Pig脚本用于连接元组中的值

时间:2017-10-11 20:40:57

标签: apache apache-pig

输入:



(11111111,{(A,MARK,APPLE,ABC1,11111111),(B,PAUL,AMAZON,ABC2,11111111),(C,TIM,FIVN,ABC3,11111111),(D,LIN,MULESFT,ABC4,11111111),(E,YEP,UHG,ABC5,11111111),(F,QIN,ATT,ABC6,11111111)})
(22222222,{(A,MARK,APPLE,ABC6,22222222),(B,MARK,AMAZON,ABC7,22222222),(C,MARK,PQE,ABC8,22222222),(D,MARK,AMB,ABC9,22222222),(E,MARK,YZQ,ABC19,22222222),(F,MARK,PQR,,22222222)})




我已使用密钥对数据进行分组,如上所述。我应该通过连接元组的所有值来生成输出,包括空值,如下所示:

输出:



(1111111,A^B^C^D^E^F,MARK^PAUL^TIM^LIN^YEP^QIN,APPLE^AMAZON^FIVN^MULESFT^UHG^ATT,ABC1^ABC2^ABC3^ABC4^ABC5^^ABC6)
(2222222,A^B^^D^E^G,TIM^AIN^TIM^BIN^CIN^DIN^RIN,APPLE^AMAZON^PQE^AMB^YZQ^RIN,ABC6^ABC7^ABC8^ABC9^ABC19^^)




有人可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

共享可能有用的代码片段,通过此工作来实现预期的输出。

输入:

1,A
1,B
1,C
2,D
2,E
2,F

输出

(1,C^B^A)
(2,F^E^D)

Pig Snippet:

data1 = load '/Users/muralirao/learning/pig/a.csv' using PigStorage(',') as (id:int, name:chararray);
req_data = FOREACH (GROUP data1 BY id) { 
    names = data1.name;
    GENERATE group AS id, BagToString(names,'^');  
};

DUMP req_data;