Apache Pig:java.lang.OutOfMemoryError:请求的数组大小超过VM限制

时间:2016-08-09 13:59:38

标签: hadoop apache-pig

我正在运行Pig 15并尝试在此处对数据进行分组。我遇到了Requested array size exceeds VM limit错误。文件大小非常小,每个10只有2.5G个映射器运行,没有内存错误。

下面显示的是我正在做的事情的片段:

sample_set = LOAD 's3n://<bucket>/<dev_dir>/000*-part.gz' USING PigStorage(',') AS (col1:chararray,col2:chararray..col23:chararray);
sample_set_group_by_col1 = GROUP sample_set BY col1;
sample_set_group_by_col1_10 = LIMIT sample_set_group_by_col1 10;
DUMP sample_set_group_by_col1_10;

此作业失败,并显示以下错误:

2016-08-08 14:28:59,622 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:641)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:474)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40)
at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:198)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:1696)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1180)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)

以前有人遇到过这个错误吗?如果是,那么解决方案是什么?

1 个答案:

答案 0 :(得分:0)

从错误看,GROUP BY语句的输出看起来很大。创建的这个巨大记录不适合当前可用的Java堆空间。通常,Hadoop Mappers和Reducer任务被分配1GB的堆空间。尝试通过以下参数运行此pig脚本来增加java堆大小。

SET mapreduce.map.memory.mb 4096;
SET mapreduce.reduce.memory.mb 6144;

如果仍然无效,请尝试使用上述参数增加尺寸。