在Pig Latin中生成计数值

时间:2017-04-12 20:53:52

标签: hadoop apache-pig

我正在尝试查找年龄介于19到60之间的用户数量。以下是示例查询

loadtable = load '/user/userdetails.txt' using PigStorage(',') AS (name:chararray,age:int);

filteredvalues = filter loadtable  by (age > 19 AND  age < 60);

grouped = GROUP filteredvalues ALL;

count = foreach grouped generate COUNT(grouped);

我收到以下错误“无效的标量投影:已分组:需要根据关系投影列,以便将其用作标量

2 个答案:

答案 0 :(得分:2)

您必须计算过滤值而不是分组。

total = foreach grouped generate COUNT(filteredvalues);

答案 1 :(得分:1)

示例 userdetails.txt:

Robin,85

BOB,55

Maya,23

Sara,45

David,23

Maggy,22

Robert,75

Syam,23

Mary,25

Saran,17

Stacy,19

Kelly,22

<强>代码:

grunt> loadtable = load '/user/userdetails.txt' using PigStorage(',') AS (name:chararray,age:int);

grunt> filteredvalues = filter loadtable  by (age > 19 AND  age < 60);

grunt> grouped = GROUP filteredvalues ALL;

grunt> count = foreach grouped generate COUNT(filteredvalues);

grunt> dump count;
  

始终在群组关系或行李之前执行计数,否则会抛出:   &#34;标量投影无效:已分组:需要投影列   从它的关系中用作标量&#34;