我有两张桌子:
1,'hello'
2,'world'
4,'this'
和
1,'john'
3,'king'
我想制作一张表
1,'hello','john'
2,'world',''
3,'' ,king
4,'this' ,''
我目前正在使用Pig命令:
JOIN A BY code FULL OUTER,
B BY code;
但这给了我输出:
1,'hello',1,'john'
2,'world',,''
,'' ,3,king
4,'this' ,,''
我需要将代码列合并,我该怎么做?感谢
答案 0 :(得分:1)
是加入总会产生这样的输出,这是猪的预期行为。一种选择可以是尝试组运算符,而不是加入运算符。
<强> A.TXT 强>
1,'hello'
2,'world'
4,'this'
<强> b.txt 强>
1,'john'
3,'king'
<强> PigScript:强>
A = LOAD 'a.txt' USING PigStorage(',') AS (code:int,name:chararray);
B = LOAD 'b.txt' USING PigStorage(',') AS (code:int,name:chararray);
C = GROUP A BY code,B BY code;
D = FOREACH C GENERATE group,(IsEmpty(A.name) ? TOTUPLE('') : BagToTuple(A.name)) AS aname,(IsEmpty(B.name) ? TOTUPLE('') : BagToTuple(B.name)) AS bname;
E = FOREACH D GENERATE group,FLATTEN(aname),FLATTEN(bname);
DUMP E;
<强>输出:强>
(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this',)
BagToTuple()在本地猪中不可用,您必须下载 pig-0.11.0.jar 并将其设置在类路径中。
从此链接下载jar:
http://www.java2s.com/Code/Jar/p/Downloadpig0110jar.htm
答案 1 :(得分:1)
A = load 'a' using PigStorage(',') as (code:int,name:chararray);
B = load 'b' using PigStorage(',') as (code:int,name:chararray);
C = join A by code full outer ,B by code;
D = foreach C generate
(A::code IS NULL ? B::code : A::code) AS code,
A::name as aname, B::name as bname;
dump D;
结果是
(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this,)
答案 2 :(得分:0)
您可以使用union,然后执行groupBy
联盟A,B会给你:
1,'hello'
2,'world'
4,'this'
1,'john'
3,'king'
现在根据id做一个groupBy。这会给你:
1, {'hello', 'john'}
2, {'world'}
3, {'king'}
4, {'this'}
现在你需要一个udf来解析这个包。在udf中迭代每个键以生成您的格式输出。
我也遇到了同样的问题。这就是我解决它的方法。
答案 3 :(得分:0)
您可以在联接后使用三元运算符重新分配新的code
,具体取决于它是否填充在A或B关系中。在此示例中,如果A.code为null
,则使用B.code,否则使用A.code。
C = JOIN A BY code FULL OUTER, B BY code;
D = FOREACH C GENERATE
(A.code IS NULL ? B.code : A.code) AS code,
A.field1,
A.field2,
B.field3,
B.field4;