我正在大约1 gb的数据上运行一个Pig脚本,其中涉及多个groupby和foreach语句。 这是示例猪代码:
ab = GROUP y BY(y1, y2 y3 y4 y5 y6);
xy = FOREACH ab {
abc = FOREACH y
生成
x1 2倍 x3 4倍 5倍 x6, rel1, rel2;
生成
组abc;
};
注意: rel1和rel2 是按组生成的,因为它们本身也是bag,对于[y1,y2,y3,y4,y5,y6],bagize包含大约448条记录,大小为700mb的猪和xy关系失败,说GC超出了开销上限。
纱线日志
2018-08-08 15:01:13,299 INFO [Service Thread] org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 1148190720(1121280K) used = 5726479864(5592265K) committed = 5726797824(5592576K) max = 5726797824(5592576K), toFree = 3046581752
2018-08-08 15:04:22,192 FATAL [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162,5,main] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-08-08 15:05:24,112 INFO [ResponseProcessor for block BP-1779694772-10.xxx.xx.17-1533341581987:blk_1074055963_315162] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException