如何找到猪的前两个评级?

时间:2014-12-12 19:23:04

标签: apache-pig

我的数据如下所示:

USA,10  
UK,8  
INDIA,8  
PAKISTAN,5  
U.A.E,3  
GERMANY,3  
SWEDEN,2

如何获得前两个评级最高的国家/地区?有了上面的示例数据,我想要这个:

UK,8  
INDIA,8 

1 个答案:

答案 0 :(得分:1)

你能试试吗?

<强>更新
如果您的猪版本中没有RANK操作员,那么使用本地猪很难解决这个问题。一个选项可以是下载pig-0.11.1.jar并将其设置在您的类路径中并尝试以下方法。

<强> input.txt中

USA,10
UK,8
INDIA,8
PAKISTAN,5
U.A.E,3
GERMANY,3
SWEDEN,2

<强> PigScript:

DEFINE MyOver org.apache.pig.piggybank.evaluation.Over('myrank:int');
DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = GROUP A ALL;
C = FOREACH B  {
                 mysort = ORDER A BY rating DESC;
                 GENERATE FLATTEN(MyStitch(mysort,MyOver(mysort,'dense_rank',0,1,1)));
                }
D = FILTER C BY stitched::myrank==2;
E = FOREACH D GENERATE stitched::country AS country,stitched::rating AS rating;
DUMP E;

<强>输出:

(UK,8)
(INDIA,8)

猪版本&gt; 11支持RANK运算符

A = LOAD 'input.txt' USING PigStorage(',') AS (country:chararray,rating:int);
B = RANK A BY rating DESC;
C = FILTER B BY rank_A==2;
D = FOREACH C GENERATE country,rating;
DUMP D;

<强>输出:

(UK,8)
(INDIA,8)