PIG集团通过避免Bag

时间:2017-10-12 01:41:03

标签: apache-pig

这是一个基本的PIG问题。 我有类似这样的数据

10  | Dog
15 | Cow
20 | Dog
15 | Elephant
15 | Dog
25 | Elephant

我想找到每只动物的平均体重并输出如下:

Dog | 12.5
Elephant | 20
Cow | 15

我可以使用GROUP by并获得结果,但结果是一个包,如下所示:

 {(Dog), (Dog) } | 12.5
 {(Elephant), (Elephant)} | 20
 {(Cow)} | 15

如何才能提取个体动物?

我正在使用GROUP。

--animal_weight is derived through other means
animal_by = GROUP animal_weight by (animal);
results = FOREACH animal_by GENERATE animal_weight.animal as animal_name, AVG(animal_weight.weight) as kg;
STORE results INTO '$output_4' USING PigStorage('|');

1 个答案:

答案 0 :(得分:0)

使用group代替animal_weight.animal。请注意,根据您的样本数据,Dog的平均体重(10 + 20 + 15)/ 3 = 15 kg

results = FOREACH animal_by GENERATE group as animal_name, AVG(animal_weight.weight) as kg;

<强>输出

enter image description here