使用模式加载数据时的Apache Pig ClassCast异常

时间:2015-01-01 00:01:44

标签: apache-pig

我有一个简单的制表符分隔文件,我正在尝试加载猪模式并添加两列。当我使用" - schema"加载时PigStorage的选项,添加失败,出现ClassCastException。当我加载' - noschema'时,添加工作正常。为什么Pig在前一种情况下没有例外?

以下是只有1行输入的示例文件,其中包含制表符分隔值:

a       1       1

架构" .pig_schema"看起来像:

{"fields":[{"name":"str","type":55,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"score","type":15,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"count","type":15,"description":"autogenerated from Pig Field Schema","schema":null}],"version":0,"sortKeys":[],"sortKeyOrders":[]}

以下是grunt shell的语句列表:

a1 = load '/local/workplace/data' using PigStorage(); --load with schema
describe a1; -- a1: {str: chararray,score: long,count: long}
b1 = foreach a1 generate score + count;
dump b1; -- throws exception
a2 = load '/local/workplace/data' using PigStorage('\t', '--noschema') as (str:chararray, score:long, count: long);
b2 = foreach a2 generate score+count; -- no exception
dump b2; -- works fine

引发的异常是:

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [Add (Name: Add[long] - scope-34 Operator Key: scope-34) chi
ldren: [[POProject (Name: Project[long][0] - scope-32 Operator Key: scope-32) children: null at []], [POProject (Name: Project[long][1] - scope-33 Op
erator Key: scope-33) children: null at []]] at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Numb
er
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.genericGetNext(Add.java:100)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNextLong(Add.java:123)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:323)

猪版:0.12.1

1 个答案:

答案 0 :(得分:0)

默认情况下,如果u dot提供架构,则所有内容都被视为字节数组。