我在Apache Pig中运行FOREACH操作时得到OutOfMemoryError。
16/06/24 15:14:17 INFO util.SpillableMemoryManager: first memory
handler call- Usage threshold init = 164102144(160256K) used =
556137816(543103K) committed = 698875904(682496K) max =
698875904(682496K)
java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="kill -9 %p"
Executing /bin/sh -c "kill -9 4095"... Killed
我的猪脚本:
A = LOAD 'PageCountTest/' USING PigStorage(' ') AS (Project:chararray,
Title:chararray, count:int , size:int);
B = GROUP A BY (Project,Title);
C = FOREACH B
generate group, SUM(A.count) AS COUNT; D = ORDER C BY COUNT DESC;
STORE C INTO '/user/hadoop/wikistats';
示例数据:
aa.b Main_Page 1 14335
aa.d India 1 4075
aa.d Main_Page 1 13190
aa.d Special:RecentChanges 1 200
aa.d Talk:Main_Page/ 1 14147
aa.d w/w/index.php 9 137502
aa Main_Page 6 9872
aa Special:Statistics 1 324
有人可以帮忙吗?
答案 0 :(得分:0)
我怀疑当你订购时出现内存问题,因为它是最重的加权。最简单的方法是在启动猪作业时使用堆大小参数。
pig -Dmapred.child.java.opts=-Xms2096M yourjob.pig
您也可以直接在脚本中声明堆大小。
export PIG_HEAPSIZE=2096