猪脚本失败,出现GC:超出开销限制

时间:2018-08-09 08:07:34

标签: out-of-memory apache-pig heap-size

我正在大约1 gb的数据上运行一个Pig脚本,其中涉及多个groupby和foreach语句。 这是示例猪代码:

  

ab = GROUP y BY(y1,       y2       y3       y4       y5       y6);

     

xy = FOREACH ab {
      abc = FOREACH y
        生成
         x1          2倍          x3          4倍          5倍          x6,          rel1,          rel2;
  生成
  组abc;
  };

注意: rel1和rel2 是按组生成的,因为它们本身也是bag,对于[y1,y2,y3,y4,y5,y6],bagize包含大约448条记录,大小为700mb的猪和xy关系失败,说GC超出了开销上限。

纱线日志

2018-08-08 15:01:13,299 INFO [Service Thread] org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 1148190720(1121280K) used = 5726479864(5592265K) committed = 5726797824(5592576K) max = 5726797824(5592576K), toFree = 3046581752
2018-08-08 15:04:22,192 FATAL [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-08-08 15:05:24,112 INFO [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

0 个答案:

没有答案