以下是我的输入
$ cat people.csv
Steve,US,M,football,6.5
Alex,US,M,football,5.5
Ted,UK,M,football,6.0
Mary,UK,F,baseball,5.5
Ellen,UK,F,football,5.0
我需要根据国家/地区对数据进行分组。
people = LOAD 'people.csv' USING PigStorage(',') AS (name:chararray,country:chararray,gender:chararray, sport:chararray,height:float);
grouped = GROUP people BY country;
现在我必须从分组数据中找到此人的最大身高及其详细信息。
所以我尝试了下面的
a = FOREACH grouped GENERATE group AS country, MAX(people.height) as height, people.name as name;
将输出显示为
(UK,6.0,{(Ellen),(Mary),(Ted)})
(US,6.5,{(Alex),(Steve)})
但我需要输出
(UK,6.0,Ted)
(US,6.5,Steve)
有人可以帮助我实现这个目标吗?
答案 0 :(得分:0)
此代码可以帮助您。
根据此代码,如果在同一个国家/地区有两名玩家的最大身高,那么您将获得这两位玩家的详细信息
records = LOAD '/home/user/footbal.txt' USING PigStorage(',') AS(name:chararray,country:chararray,gender:chararray,sport:chararray,height:double);
records_grp = GROUP records BY (country);
records_each = foreach records_grp generate group as temp_country, MAX(records.height) as max_height;
records_join = join records by (country,height), records_each by (temp_country,max_height);
records_output = foreach records_join generate country, max_height, name;
dump records_output;
OutPut:
(UK,6.0,Ted)
(US,6.5,Steve)