想要使用猪找到记录中的最大记录

时间:2014-05-29 03:38:52

标签: hadoop apache-pig

我想找到一个使用猪对每支队伍进行最大比赛得分的球员。

Input : Inputs are in the below fashion

Sachin 100 KXIP Hyderabad 1991 sehwag 150 KXIP Hyderabad 1991 Sehwag 100 MI Mumbai 2011 Kohli 0 CSK Chennai 2014 Dhoni 150 MI Hyderabad 1991 Sachin 32 PW Chennai 2014 Dhoni 150 MI Mumbai 2011
    我的实施: record1= LOAD 'ipl.txt' using PigStorage(' ') as (name:chararray,runs:int,team:chararray,loc:chararray,year:int); record2 = GROUP record1 by team as team; record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx; record4= ORDER record3 by mx ASC; DUMP record4;

 
    Output:
    (PW,32)
    (KXIP,150)
    (MI,150)

    But expecting the result in the following fashion..
    Sachin PW 32 Chennai 2014
    

2 个答案:

答案 0 :(得分:0)

record1= LOAD 'ipl.txt' using PigStorage(' ') as    (name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team;
record3 = FOREACH record2 GENERATE group,MAX(record1.runs) as mx;
record4 = JOIN record3 by (mx,group) LEFT OUTER, record1 by (runs,team);
record5 = FOREACH record4 GENERATE record1::name as name, record1::team as team, record3::mx as mx, record1::year as year;
record6= ORDER record5 by mx ASC;
DUMP record6;

产生以下结果

(Kohli,CSK,0,2014)
(Sachin,PW,32,2014)
(sehwag,KXIP,150,1991)
(Dhoni,MI,150,1991)
(Dhoni,MI,150,2011)

注意Dhoni有两条记录,这是因为他得分150两次。如果你想删除它,你需要根据你的需要选择最早或最近的一年。

答案 1 :(得分:0)

我会使用TOP函数:http://pig.apache.org/docs/r0.11.0/func.html#topx

以下是获取所需结果的脚本:

record1= LOAD 'ipl.txt' using PigStorage(' ') as 
(name:chararray,runs:int,team:chararray,loc:chararray,year:int);
record2 = GROUP record1 by team; 
record3 = FOREACH record2 GENERATE FLATTEN(TOP(1,1,record1));
record4= ORDER record3 by runs ASC;
DUMP record4;

结果,你会得到:

(Kohli,0,CSK,Chennai,2014)
(Sachin,32,PW,Chennai,2014)
(sehwag,150,KXIP,Hyderabad,1991)
(Dhoni,150,MI,Hyderabad,1991)