如何在多个领域加入猪的两种关系

时间:2017-10-22 14:23:35

标签: hadoop apache-pig hortonworks-sandbox

我有两个CSV文件:

1- Fertiltiy.csv:

enter image description here

2- Life Expectency.csv:

enter image description here

我想将它们加入猪中,以便结果如下:

enter image description here

我是猪的新手,我无法得到正确答案,但这是我的代码:

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();

lifeExpectency = LOAD 'lifeExpectency' USING   org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by country, lifeExpectency by country; 

B = JOIN fertility by year, lifeExpectency by year; 

C = UNION A,B;

DUMP C; 

以下是我的代码的结果:

enter image description here

1 个答案:

答案 0 :(得分:1)

您按国家/地区和年份加入,并选择最终输出所需的必要列。

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();
lifeExpectency = LOAD 'lifeExpectency' USING   org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by (country,year), lifeExpectency by (country,year); 
B = FOREACH A GENERATE  fertility::country,fertility::year,fertility::fertility,lifeExpectency::lifeExpectency;  
DUMP B;