在单词计数程序中如何找到猪中发生最多的单词和最少发生的单词。如何在这里使用MAX功能。
我看到的输出
(纳温,3) (是,5)
这里我需要的输出是"是"
答案 0 :(得分:0)
你可以使用orderBy并限制: -
A =使用PigStorage()加载'文件'为(名称:chararray,count:int);
B =按计数排序A; - 默认情况下,它将是升序
C =限制B 1;
D = Foreach C生成名称;
转储D;
B =按计数desc的顺序A;
C =限制B 1;
D = Foreach C生成名称;
转储D;
答案 1 :(得分:0)
以下示例将帮助您获得前5名
infiles = load '/hdfs/bhavesh/Youtube_POC/Youtube/0222/{0,1,2,3,4}.txt' using PigStorage('\t') as
(videoid:chararray,uploader:chararray,age:int,category:chararray,length:int,views:int,rate:int,rating:int,comments:int,related_id:chararray);
files = FILTER infiles BY category is not null;
grpn_for_catagories = group files by category;
cnt_for_catagories = foreach grpn_for_catagories generate group, COUNT(files.videoid) as counting;
sorted_for_catagories_desc = order cnt_for_catagories by counting desc;
top5_for_catagories = limit sorted_for_catagories_desc 5;
详细说明可在
中找到http://ybhavesh.blogspot.in/2015/08/proof-of-concept-or-poc-on-youtube-data.html
希望它能帮助!!! ...
答案 2 :(得分:0)
A =加载'文件'使用PigStorage()作为(名称:chararray,count:int);
B =按计数排序A;
C =限制B 1;
D = foreach C生成名称;
转储D;