AWS EMR HiveQL - java.lang.OutOfMemoryError:Java堆空间

时间:2015-12-07 18:40:02

标签: hive hiveql emr

我在AWS EMR上运行HiveQL作业并收到以下错误(在代码块中如下)。该实例有39个M3.2XLarge(m3.2xlarge 8vCPU 30GB内存2 x 80GB SSD存储)节点,总内存为1.1TB。

HiveQL文件从S3加载数据,以ORC格式创建较小的主数据表。有很多中间表在错误发生之前正确执行。错误的代码块是select count(distinct ...) from <main data table>

有没有办法在每个新语句之前清理/清除内存? 我需要调整堆的大小吗? 我还能提供什么来帮助更好地了解数据和环境?

...错误

    Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:381)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) 

1 个答案:

答案 0 :(得分:1)

临时答案是增加“减速器”内存分配...

SET mapreduce.reduce.memory.mb=6000; SET mapreduce.reduce.java.opts=-Xmx5000m;