PIg标量大于0

时间:2018-11-22 14:20:57

标签: apache-pig cloudera scalar

我有以下代码

Data1 = LOAD '/user/cloudera/Class Ex 2/Data 1' USING PigStorage(',') as (Name:chararray,ID:chararray,text_1:chararray,Grade_1:int,Grade_2:int,Grade_3:int,Grade_4:int);
Data2 = LOAD '/user/cloudera/Class Ex 2/Data 2' USING PigStorage(',') as (Name:chararray,ID:chararray,text_2:chararray,Grade_5:int,Grade_6:int,Grade_7:int,Grade_8:int);

Data_3 = JOIN Data1 BY Data1.ID,Data2 BY Data2.ID;
Data_4 = FOREACH Data_3 GENERATE $0,$1,$2,$3,$4,$5,$6,$9,$10,$11,$12,$13;

Data_5 = FOREACH Data_4 GENERATE
                            Name,
                            ID,
                            text_1,
                            SIZE(text_1),
                            REPLACE(text_1,'or',''),
                            SIZE(REPLACE(text_1,'or','')),
                            SIZE(text_1)-SIZE(REPLACE(text_1,'or','')),
                            text_2,
                            SIZE(text_2),
                            REPLACE(text_2,'or',''),
                            SIZE(REPLACE(text_2,'or','')),
                            SIZE(text_2)-SIZE(REPLACE(text_2,'or','')),
                            ($3+$4+$5+$6+$8+$9+$10+$11)/8;
DESCRIBE Data_5;
STORE Data_5 Into '/user/cloudera/Class Ex 2/Data_output' USING PigStorage(',');

基本上,我必须加载2组数据,然后进行一些基本的文本统计和操作。 一切正常,直到最后一个语句STORE。 当我添加它时,我会收到标量错误。

我在这里做错了什么? 谢谢大家!

1 个答案:

答案 0 :(得分:2)

首先,Pig仅评估别名',该别名最终导致STOREDUMP(这称为惰性评估)。因此,您的错误始终存在;添加STORE语句后,它就被捕获了。由于您尚未粘贴完整的跟踪记录,因此我认为您的错误在于您尝试使用点(ID)运算符访问字段.的第三条语句。您需要将其更改为以下之一:

1)直接引用字段ID,因为在IDData1中只有一个称为Data2的字段:

Data_3 = JOIN Data1 BY ID, Data2 BY ID;

2)如果确实需要消除歧义,请使用::代替.

Data_3 = JOIN Data1 BY Data1::ID, Data2 BY Data2::ID;

如果您想知道点(.)运算符为什么引起错误,可能有助于查看以下问题:Getting exception while trying to execute a Pig Latin Script