Pig为简单的Group by和count发生任务抛出错误

时间:2013-09-15 08:52:08

标签: hadoop apache-pig

使用Hadoop的PIG-Latin查找搜索引擎日志文件中唯一搜索字符串的出现次数。(click here to view the sample log file) 请帮帮我。提前谢谢。

猪脚本

excitelog = load '/user/hadoop/input/excite-small.log' using PigStorage() AS
(encryptcode:chararray, numericid:int, searchstring:chararray);                                        

GroupBySearchString = GROUP excitelog by searchstring;    

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,count(searchstring);

遇到错误

 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

2 个答案:

答案 0 :(得分:4)

你需要这样做:

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,
                                                COUNT(excitelog) as kount;

这是因为分组在猪中的工作方式,GroupBySearchString将是{group, excitelog}的一个包,其中excitelog本身就是与该组匹配的所有元组的包。 COUNT是一个UDF将一个包作为输入并返回包中的元组数。因此,COUNT(excitelog)将为您提供与group匹配的元组数。

答案 1 :(得分:0)

函数名称PigStorage和COUNT区分大小写。 所以需要保持COUNT功能,如下所示:

wordcount = FOREACH grouped GENERATE group , COUNT(words);