Apache Pig:具有FOREACH聚合的OutOfMemoryError

时间:2016-06-24 15:28:36

标签: apache-pig emr

我在Apache Pig中运行FOREACH操作时得到OutOfMemoryError。

16/06/24 15:14:17 INFO util.SpillableMemoryManager: first memory     
handler call- Usage threshold init = 164102144(160256K) used =
556137816(543103K) committed = 698875904(682496K) max =
698875904(682496K)

java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="kill -9 %p"
Executing /bin/sh -c "kill -9 4095"... Killed

我的猪脚本:

A = LOAD 'PageCountTest/' USING PigStorage(' ') AS (Project:chararray, 
Title:chararray, count:int , size:int); 

B = GROUP A BY (Project,Title); 

C = FOREACH B 
generate group, SUM(A.count) AS COUNT; D = ORDER C BY COUNT DESC;

STORE C INTO '/user/hadoop/wikistats';

示例数据:

aa.b Main_Page 1 14335
aa.d India 1 4075
aa.d Main_Page 1 13190
aa.d Special:RecentChanges 1 200
aa.d Talk:Main_Page/ 1 14147
aa.d w/w/index.php 9 137502
aa Main_Page 6 9872
aa Special:Statistics 1 324

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

我怀疑当你订购时出现内存问题,因为它是最重的加权。最简单的方法是在启动猪作业时使用堆大小参数。

pig -Dmapred.child.java.opts=-Xms2096M yourjob.pig

您也可以直接在脚本中声明堆大小。

export PIG_HEAPSIZE=2096