如何从猪的分组关系中找到最大值及其相关值?

时间:2016-02-03 11:22:39

标签: apache-pig bigdata

以下是我的输入

$ cat people.csv
Steve,US,M,football,6.5
Alex,US,M,football,5.5
Ted,UK,M,football,6.0
Mary,UK,F,baseball,5.5
Ellen,UK,F,football,5.0

我需要根据国家/地区对数据进行分组。

people = LOAD 'people.csv' USING PigStorage(',') AS (name:chararray,country:chararray,gender:chararray, sport:chararray,height:float);
grouped = GROUP people BY country;

现在我必须从分组数据中找到此人的最大身高及其详细信息。

所以我尝试了下面的

a = FOREACH grouped GENERATE group AS country, MAX(people.height) as height, people.name as name;

将输出显示为

(UK,6.0,{(Ellen),(Mary),(Ted)})
(US,6.5,{(Alex),(Steve)})

但我需要输出

(UK,6.0,Ted)
(US,6.5,Steve)

有人可以帮助我实现这个目标吗?

1 个答案:

答案 0 :(得分:0)

此代码可以帮助您。

根据此代码,如果在同一个国家/地区有两名玩家的最大身高,那么您将获得这两位玩家的详细信息

records = LOAD '/home/user/footbal.txt' USING PigStorage(',') AS(name:chararray,country:chararray,gender:chararray,sport:chararray,height:double);

records_grp  = GROUP records BY (country);

records_each = foreach records_grp generate group as temp_country, MAX(records.height) as max_height;

records_join = join records by (country,height), records_each by (temp_country,max_height);

records_output = foreach records_join generate country, max_height, name;

dump records_output;

OutPut:

(UK,6.0,Ted)
(US,6.5,Steve)