Pig join中的Cast错误

时间:2014-10-07 17:21:17

标签: hadoop apache-pig

我有一个执行JOIN的脚本;当我在小数据上运行它成功时,但是当我增加数据大小时,我得到这个错误:

14/10/07 19:10:19 ERROR executionengine.Launcher: Backend error message
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POProject (Name: Project[tuple][0] - scope-577 Operator Key: scope-577) children: null at []]: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.Tuple
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:339)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:304)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNextTuple(POUnion.java:167)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.Tuple
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNextTuple(POProject.java:475)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
        ... 13 more

我猜问题不是由于输入错误而是由于它们的大小(中等大小的数据集不在开发服务器上运行,而是在更大的集群上运行)。

你能帮我理解错误的原因吗?

1 个答案:

答案 0 :(得分:1)

我的猜测是大数据集中的一行是Long值而不是元组。这导致了强制转换异常。发布您的猪脚本和一些示例行也会有所帮助。