Question

使用Hadoop的PIG-Latin查找搜索引擎日志文件中唯一搜索字符串的出现次数。（click here to view the sample log file）请帮帮我。提前谢谢。

猪脚本

excitelog = load '/user/hadoop/input/excite-small.log' using PigStorage() AS
(encryptcode:chararray, numericid:int, searchstring:chararray);                                        

GroupBySearchString = GROUP excitelog by searchstring;    

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,count(searchstring);

遇到错误

 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Answer 1

你需要这样做：

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,
                                                COUNT(excitelog) as kount;

这是因为分组在猪中的工作方式，GroupBySearchString将是{group, excitelog}的一个包，其中excitelog本身就是与该组匹配的所有元组的包。 COUNT是一个UDF将一个包作为输入并返回包中的元组数。因此，COUNT(excitelog)将为您提供与group匹配的元组数。

Answer 2

函数名称PigStorage和COUNT区分大小写。所以需要保持COUNT功能，如下所示：

wordcount = FOREACH grouped GENERATE group , COUNT(words);

Pig为简单的Group by和count发生任务抛出错误

2 个答案: