我想找到一个使用猪对每支队伍进行最大比赛得分的球员。
Input : Inputs are in the below fashion
Sachin 100 KXIP Hyderabad 1991
sehwag 150 KXIP Hyderabad 1991
Sehwag 100 MI Mumbai 2011
Kohli 0 CSK Chennai 2014
Dhoni 150 MI Hyderabad 1991
Sachin 32 PW Chennai 2014
Dhoni 150 MI Mumbai 2011
我的实施:
record1= LOAD 'ipl.txt' using PigStorage(' ') as (name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team as team;
record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx;
record4= ORDER record3 by mx ASC;
DUMP record4;
Output: (PW,32) (KXIP,150) (MI,150) But expecting the result in the following fashion.. Sachin PW 32 Chennai 2014
答案 0 :(得分:0)
record1= LOAD 'ipl.txt' using PigStorage(' ') as (name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team;
record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx;
record4 = JOIN record3 by (mx,group) LEFT OUTER, record1 by (runs,team);
record5 = FOREACH record4 GENERATE record1::name as name, record1::team as team, record3::mx as mx, record1::year as year;
record6= ORDER record5 by mx ASC;
DUMP record6;
产生以下结果
(Kohli,CSK,0,2014)
(Sachin,PW,32,2014)
(sehwag,KXIP,150,1991)
(Dhoni,MI,150,1991)
(Dhoni,MI,150,2011)
注意Dhoni有两条记录,这是因为他得分150两次。如果你想删除它,你需要根据你的需要选择最早或最近的一年。
答案 1 :(得分:0)
我会使用TOP函数:http://pig.apache.org/docs/r0.11.0/func.html#topx
以下是获取所需结果的脚本:
record1= LOAD 'ipl.txt' using PigStorage(' ') as
(name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team;
record3 = FOREACH record2 GENERATE FLATTEN(TOP(1,1,record1));
record4= ORDER record3 by runs ASC;
DUMP record4;
结果,你会得到:
(Kohli,0,CSK,Chennai,2014)
(Sachin,32,PW,Chennai,2014)
(sehwag,150,KXIP,Hyderabad,1991)
(Dhoni,150,MI,Hyderabad,1991)