Pig进程多文件错误:错误0:在[]执行ForEach时出错

时间:2014-07-22 03:07:09

标签: apache-pig

我在HDFS上的目录/ user / bizlog / cpc下有4个文件A,B,C,D,记录如下所示: 87465422 ^ C376832 ^ C27786 ^ C21161214 ^ CKEY

这是我的猪脚本:

cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key);
cpc = foreach cpc_all generate accountid, key;
account_group = group cpc by accountid;
account_sort = order account_group by group;
account_key = foreach account_sort generate group, BagToTuple(cpc.key);
store account_key into 'last' using PigStorage('\u0003');

它会得到如下结果: 376832 ^ Ckey1 ^ Ckey2

上面的脚本假设要处理所有4个文件,但是我收到了这个错误:

Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.

Pig Stack Trace
---------------
ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
================================================================================

奇怪的是,如果我加载一个单独的文件,例如load'/ user / bizlog / cpc / A',那么脚本将会成功。

如果我首先加载每个文件然后将它们合并,它也可以正常工作。

如果我将排序步骤放在最后并且错误消失

hadoop的版本是0.20.2,猪版本是0.12.1,任何帮助将不胜感激

1 个答案:

答案 0 :(得分:0)

如评论中所述:

  

我把排序步骤放在最后,错误就消失了

虽然我对这个话题没有太多了解,但猪似乎不喜欢重新组织这个群体。

因此,“解决方案”是重新生成为组生成的内容的输出,而不是对组本身进行排序。