我正在运行以下的猪,但我得到了一个ERROR 1066:无法为别名H打开迭代器。
A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
G = GROUP F BY E.id;
H = FOREACH G GENERATE $0, SUM($1.hits);
DUMP H;
当我描述G时,我得到:
G: {group: bytearray,F: {(E::id: bytearray,E::hits: int,C::id:bytearray,
C::first: bytearray,C::last: bytearray,C::bats:bytearray,
C::birthMonth: byetarray,C::deathYear: bytearray)}}
我在SUM()函数中尝试了很多东西:F:hits,F.hits,FEhits,E.hits,E:hits但我不知道我应该如何引用袋子里的元组。
感谢您的想法。
答案 0 :(得分:1)
我建议你尝试一下(没试过实践):
A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
----- Try generating the columns you need and try DUMP to see if output
F1 = FOREACH F GENERATE E::id as id, E::hits as hits;
G = GROUP F1 BY id;
H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits);
DUMP H;
注意H =前G G生成FLATTEN(组)作为ID,SUM(F1.hits);这是代码中的错误。
答案 1 :(得分:0)
这可能有几个原因发生:
a)正在运行的猪版本需要更改。
ERROR 1066: Unable to open iterator for alias - Pig
b)测试数据中的值可能具有空值。
为此尝试使您的脚本适应下面类似的脚本:
values = FOREACH test1 GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;
这可能会解决问题。