PIG:不明白为什么当COUNTY这样做时AVG不起作用

时间:2015-03-21 02:14:03

标签: count apache-pig aggregate average

我在Pig中运行以下命令集。我的数据集对于班级中的每个学生都有一行,每个学生都有多个等级。学生姓名与该学生的成绩分开。每个学生的分数以逗号分隔。我需要找到每个学生的平均成绩。 分组后,我可以成功获得每个学生的成绩,但我不能得到每个学生的平均分数。 Pig抱怨它在平均时无法找到迭代器。我很困惑,因为聚合函数COUNT和AVG的迭代器是相同的。我不确定我错过了什么。任何帮助表示赞赏?

脚本:

grunt>  A = LOAD 'grades.txt' USING PigStorage('\t') AS   
(f1:chararray,f2:chararray);
grunt> dump A;
(s14,59,94,81)
(s15,60,77)
(s16,77,77)
(s17,76,76)
(s18,19,61,72)
(s20,34,35)

grunt> B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as (grade:int);
grunt> describe B;
B: {stu: chararray,grade: int}
grunt> dump B;
(s14,59)
(s14,94)
(s14,81)
(s15,60)
(s15,77)
(s16,77)
(s16,77)
(s17,76)
(s17,76)
(s18,19)
(s18,61)
(s18,72)
(s20,34)
(s20,35)
grunt> grp = group B by stu;
grunt> cnt = foreach grp generate group, COUNT(B.grade);
grunt> dump cnt;
(s14,3)
(s15,2)
(s16,2)
(s17,2)
(s18,3)
(s20,2)
grunt> avg = foreach grp generate group, AVG(B.grade);
grunt> dump avg;
2015-03-20 21:56:30,900 ERROR org.apache.pig.tools.pigstats.PigStatsUtil: 
1 map  reduce job(s) failed!
2015-03-20 21:56:30,907 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 1066: 
Unable to open iterator for alias avg
Details at logfile: /home/training/pig/pig_1426902869706.log
grunt>

1 个答案:

答案 0 :(得分:0)

正如评论中所提到的,找到了一种解决方法:

已更改

B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as (grade:int)

B = foreach A generate f1 as stu, Flatten(TOKENIZE(f2)) as grade

然后将包复制到:

C = foreach B generate stu as stu, grade as (int)grade;