猪SUM不起作用

时间:2017-03-23 03:03:02

标签: apache-pig

我正在运行以下的猪,但我得到了一个ERROR 1066:无法为别名H打开迭代器。

A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
G = GROUP F BY E.id;
H = FOREACH G GENERATE $0, SUM($1.hits);
DUMP H;

当我描述G时,我得到:

G: {group: bytearray,F: {(E::id: bytearray,E::hits: int,C::id:bytearray,
    C::first: bytearray,C::last: bytearray,C::bats:bytearray,
    C::birthMonth: byetarray,C::deathYear: bytearray)}}

我在SUM()函数中尝试了很多东西:F:hits,F.hits,FEhits,E.hits,E:hits但我不知道我应该如何引用袋子里的元组。

感谢您的想法。

2 个答案:

答案 0 :(得分:1)

我建议你尝试一下(没试过实践):

A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id; 
----- Try generating the columns you need and try DUMP to see if output 
F1 = FOREACH F GENERATE E::id  as id, E::hits as hits;
G = GROUP F1 BY id;
H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits);
DUMP H;
  

注意H =前G G生成FLATTEN(组)作为ID,SUM(F1.hits);这是代码中的错误。

答案 1 :(得分:0)

这可能有几个原因发生:

a)正在运行的猪版本需要更改。 ERROR 1066: Unable to open iterator for alias - Pig
b)测试数据中的值可能具有空值。 为此尝试使您的脚本适应下面类似的脚本:

values = FOREACH test1 GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;

这可能会解决问题。