结合联盟和加入apache猪

时间:2016-08-30 07:56:30

标签: hadoop apache-pig

我在hdfs中有两个包含数据的文件,如下所示:File1:

id,name,age
1,x1,15
2,x2,14
3,x3,16

文件2:

id,name,grades
1,x1,A
2,x2,B
4,y1,A
5,y2,C

我想生成以下输出:

id,name,age,grades
1,x1,15,A
2,x2,14,B
3,x3,16,
4,y1,,A
5,y2,,C

我正在使用Apache pig来执行操作,是否有可能在猪中获得上述输出。这是一种联盟和加入。

3 个答案:

答案 0 :(得分:1)

你可以做猪的工会和加入,这当然是可能的。

如果不深入研究确切的语法,我可以告诉你这应该有效(过去使用过类似的解决方案)。

  1. 假设我们有A和B.
  2. 将A和B的前两列设为A2和B2
  3. 联盟A2和B2进入M2
  4. 区别M2
  5. 现在你有了'索引'矩阵,我们只需要添加额外的列。

    1. 左连接M2与A和B
    2. 生成相关列
    3. 多数民众赞成!

答案 1 :(得分:1)

$data = $FajlNev[0]; echo json_encode($data);

根据表现,第二种方法更好

A = load 'pdemo/File1' using PigStorage(',') as(id:int,name:chararray,age:chararray);   
B = load 'pdemo/File2' using PigStorage(',') as(id:int,name:chararray,grades:chararray);

lj = join A by id left outer,B by id;
rj = join A by id right outer,B by id; 

lj1 = foreach lj generate A::id as id,A::name as name,A::age as age,B::grades as grades;
rj1 = foreach rj generate B::id as id,B::name as name,A::age as age,B::grades as grades;

res = union lj1,rj1;  
FinalResult = distinct res; 

希望这会有所帮助!!

答案 2 :(得分:0)

u1 = load 'PigDir/u1' using PigStorage(',') as (id:int,name:chararray,age:int);
u2 = load 'PigDir/u2' using PigStorage(',') as (id:int, name:chararray,grades:chararray);

uj = join u2 by id full outer,u1 by id;

uif = foreach uj generate ($0 is null ?$3:$0) as id,($1 is null ? $4 : $1) as name,$5 as age,$2 as grades;