如何加入猪袋

时间:2016-07-25 13:27:26

标签: hadoop apache-pig

首先我有两个数据文件。

largefile.txt:

1001    {(1,-1),(2,-1),(3,-1),(4,-1)}

smallfile.txt:

1002    {(1,0.04),(2,0.02),(4,0.03)}

我希望像这样的smallfile.txt:

1002    {(1,0.04),(2,0.02),(3,-1),(4,0.03)}

我可以做什么类型的联接?

A = LOAD './largefile.txt' USING PigStorage('\t') AS (id:int, a:bag{tuple(time:int,value:float)});

B = LOAD './smallfile.txt' USING PigStorage('\t') AS (id:int, b:bag{tuple(time:int,value:float)});

1 个答案:

答案 0 :(得分:0)

你能清楚一下你的要求吗?是否要从bigfile.txt和smallfile.txt加入具有相同值的第一列/字段(例如1002)。如果是这种情况,您可以简单地执行此操作: -

A = LOAD' ./ largefile.txt'使用PigStorage(' \ t')AS(id:int,a:bag {tuple(time:int,value:float)});

A = Foreach A生成id,FLATTEN(a)作为时间,值;

B = LOAD' ./ smallfile.txt'使用PigStorage(' \ t')AS(id:int,b:bag {tuple(time:int,value:float)});

B = Foreach B生成id,FLATTEN(b)为时间,值;

加入=由A.id加入A,B由B.id加入;