把东西从猪里拿出来

时间:2012-05-01 22:13:45

标签: hadoop group-by apache-pig bag

在猪的例子中:

A = LOAD 'student.txt' AS (name:chararray, term:chararray, gpa:float);

DUMP A;
(John,fl,3.9F)
(John,wt,3.7F)
(John,sp,4.0F)
(John,sm,3.8F)
(Mary,fl,3.8F)
(Mary,wt,3.9F)
(Mary,sp,4.0F)
(Mary,sm,4.0F)

B = GROUP A BY name;

DUMP B;
(John,{(John,fl,3.9F),(John,wt,3.7F),(John,sp,4.0F),(John,sm,3.8F)})
(Mary,{(Mary,fl,3.8F),(Mary,wt,3.9F),(Mary,sp,4.0F),(Mary,sm,4.0F)})

C = FOREACH B GENERATE A.name, AVG(A.gpa);

DUMP C;
({(John),(John),(John),(John)},3.850000023841858)
({(Mary),(Mary),(Mary),(Mary)},3.925000011920929)

最后一个输出A.name是一个包。我怎样才能从袋子里拿出东西:

(John, 3.850000023841858)
(Mary, 3.925000011920929)

1 个答案:

答案 0 :(得分:4)

GROUP创建了一个名为group的神奇项目,这是您分组的内容。这是为了这个目的。

B = GROUP A BY name;

C = FOREACH B GENERATE group AS name, AVG(A.gpa);

结帐DESCRIBE B;,您会看到group在那里。它是一个单独的值,表示BY ...GROUP部分的内容。