Hadoop猪全加入

时间:2014-10-20 22:28:32

标签: hadoop apache-pig

我是PIG的初学者。我的问题是加入后:

ALLDATA1 = join dataA1 by subject FULL, dataT1 by subject;
ALLDATA2 = join ALLDATA1 by dataA1::subject FULL, dataR1 by subject;

我有3列

ALLDATA1::dataA1::subject, 
ALLDATA1::dataT1::subject, 
dataR1::subject 

我需要一个人。因为当一个空的时候,其他的则不是。如何将所有主题放在一列中?或者热以合并此列的条件:当ALLDATA1 :: dataA1 :: subject为空时使用其他列。

由于

1 个答案:

答案 0 :(得分:0)

获得3列输出后,您可以使用三元运算符(是其他传统语言的运算符)组合3列生成1.如下所示:

ALLDATA1 = join dataA1 by subject FULL, dataT1 by subject;
ALLDATA2 = join ALLDATA1 by dataA1::subject FULL, dataR1 by subject;

ALLDATA3 = FOREACH ALLDATA2 
               GENERATE
                   ALLDATA1::dataA1::subject IS NOT NULL ?
                       ALLDATA1::dataA1::subject :
                       (ALLDATA1::dataT1::subject IS NOT NULL ? 
                            ALLDATA1::dataT1::subject : 
                            dataR1::subject
                       ) AS subjet;

希望这有帮助。