猪群简单平均?

时间:2013-04-03 00:54:45

标签: apache-pig

我有一个非常简单的2列数据,有一个double和一个chararray:

user1 234.43
user1 432.23
user2 4321.213
etc.

我想按用户分组,然后计算双打的平均值。怎么样?我需要“GROUP * ALL”吗?我正在尝试按照示例2 http://wiki.apache.org/pig/PigOverview,但它不适用于我。

selfReportsAndDiscrepancies = FOREACH discrepancies1 GENERATE discrepancy,selfReportedText;
perDiscrepancy = GROUP selfReportsAndDiscrepancies BY selfReportedText;

allDiscrep = group perDiscrepancy all;

means = FOREACH allDiscrep GENERATE AVG(perDiscrepancy.discrepancy);

DUMP means;
DESCRIBE means;

给了我:

2013-04-02 17:54:06,611 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1128: Cannot find field discrepancy in group:chararray,selfReportsAndDiscrepancies:bag{:tuple(discrepancy:double,selfReportedText:chararray)}

1 个答案:

答案 0 :(得分:0)

我希望我能正确理解你,你想要平均的群体平均值:

VISITS = LOAD 'data' USING PigStorage(' ')  AS (user:chararray, number:double);
USER_VISITS = GROUP VISITS BY user;
USER_AVG = FOREACH USER_VISITS GENERATE group AS user, AVG(VISITS.number) AS average;
ALL_AVG = GROUP USER_AVG ALL;
OVERALL_AVG = FOREACH ALL_AVG GENERATE AVG(USER_AVG.average);
DUMP OVERALL_AVG;

结果:

(2327.2715)