无法推断COUNT函数

时间:2012-03-22 16:19:19

标签: apache-pig

我正在尝试编写一个猪拉丁语脚本来提取我已过滤的数据集的计数。

到目前为止,这是脚本:

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT($0);
DUMP scancount;

出于某种原因,我收到错误:

  

无法将org.apache.pig.builtin.COUNT的匹配函数推断为多个或不适合。请使用明确的演员。

我在这里做错了什么?我假设它与我传入的字段类型有关,但我似乎无法解决这个问题。

TIA, 杰森

3 个答案:

答案 0 :(得分:14)

这是你正在寻找的东西(所有人都将所有东西放在一个袋子中,然后计算物品):

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
dump count;

答案 1 :(得分:4)

COUNT 需要先前的GROUP ALL语句用于全局计数,而GROUP BY语句用于组计数。

您可以使用以下任何一项:

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
DUMP scancount;

或者

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT($1);
DUMP scancount;

答案 2 :(得分:0)

也许

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT(productscans);
DUMP scancount;