首先我有两个数据文件。
largefile.txt:
1001 {(1,-1),(2,-1),(3,-1),(4,-1)}
smallfile.txt:
1002 {(1,0.04),(2,0.02),(4,0.03)}
我希望像这样的smallfile.txt:
1002 {(1,0.04),(2,0.02),(3,-1),(4,0.03)}
我可以做什么类型的联接?
A = LOAD './largefile.txt' USING PigStorage('\t') AS (id:int, a:bag{tuple(time:int,value:float)});
B = LOAD './smallfile.txt' USING PigStorage('\t') AS (id:int, b:bag{tuple(time:int,value:float)});
答案 0 :(得分:0)
你能清楚一下你的要求吗?是否要从bigfile.txt和smallfile.txt加入具有相同值的第一列/字段(例如1002)。如果是这种情况,您可以简单地执行此操作: -
A = LOAD' ./ largefile.txt'使用PigStorage(' \ t')AS(id:int,a:bag {tuple(time:int,value:float)});
A = Foreach A生成id,FLATTEN(a)作为时间,值;
B = LOAD' ./ smallfile.txt'使用PigStorage(' \ t')AS(id:int,b:bag {tuple(time:int,value:float)});
B = Foreach B生成id,FLATTEN(b)为时间,值;
加入=由A.id加入A,B由B.id加入;