如何在PIG关系中建立动态场

时间:2016-10-04 07:16:57

标签: apache-pig

我与3个领域的猪关系为:

A = Load 'record.txt' as (name chararray,ID int,subject chararray,flag boolean);<br>
DUMP A;

( RAM,222,JAVA,true)
( RAM,111,DotNet,false)
( RAM,444,HTML,false)
( SAM,777,DotNet,true)
( SAM,333,JAVA,false)

如何使用名称和ID的连接生成额外字段作为参考, 当flag为true时,否则它将是重复,直到next true出现,如下所示:

( RAM,222,JAVA,true,RAM-222)
( RAM,111,DotNet,false,RAM-222)
( RAM,444,HTML,false,RAM-222)
( SAM,777,DotNet,true,SAM-777)
( SAM,333,JAVA,false,SAM-777)

使用下面的脚本,但它没有给出确切的结果。

A = Load 'demo.txt' as (name chararray,ID int,subject chararray,flag boolean);
B = FOREACH A GENERATE name,ID,subject,flag,CONCAT(name,ID) As reference;
DUMP B;

( RAM,222,JAVA,true,RAM-222)
( RAM,111,DotNet,false,RAM-111)
( RAM,444,HTML,false,RAM-444)
( SAM,777,DotNet,true,SAM-777)
( SAM,333,JAVA,false,SAM-333)

什么应该是CONCAT功能或任何其他方式来获得确切的结果?

1 个答案:

答案 0 :(得分:1)

A = Load 'demo.txt' as (name chararray,id int,sub chararray,flg boolean);
B = FOREACH A GENERATE name,id,sub,flg,CONCAT(name,ID) As rf;

split B into b1 if flg=='true', b2 if flg=='false';  
C = join b2 by name left outer,b1 by name;
C1 = foreach C generate b2::name as name,b2::id as id,b2::sub as sub,b2::flg as flg,b1::rf as rf;

Result = union b1,C1;

希望这会有所帮助!!