Hadoop流式传输“超出GC开销限制”

时间:2015-10-26 08:38:50

标签: hadoop out-of-memory hadoop-streaming

我正在运行此命令:

hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>"  -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"

其中<input dir>是包含许多avro个文件的目录。

收到此错误:

  

线程“main”中的异常java.lang.OutOfMemoryError:GC开销   限制超过   org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287)     在   org.apache.hadoop.hdfs.protocol.DatanodeID。(DatanodeID.java:91)     在   org.apache.hadoop.hdfs.protocol.DatanodeInfo。(DatanodeInfo.java:136)     在   org.apache.hadoop.hdfs.protocol.DatanodeInfo。(DatanodeInfo.java:122)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)     在   org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)     在   org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)     在sun.reflect.GeneratedMethodAccessor3.invoke(未知来源)at   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:601)at   org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)     在   org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)     在com.sun.proxy。$ Proxy15.getListing(未知来源)at   org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)at at   org.apache.hadoop.hdfs.DistributedFileSystem $ DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888)     在   org.apache.hadoop.hdfs.DistributedFileSystem $ DirListingIterator.hasNext(DistributedFileSystem.java:863)     在   org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267)     在   org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)     在   org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)     在   org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)     在   org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)     在   org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)     在org.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1296)at at   org.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1293)at at   java.security.AccessController.doPrivileged(Native Method)at   javax.security.auth.Subject.doAs(Subject.java:415)

如何解决此问题?

1 个答案:

答案 0 :(得分:5)

花了一段时间,但我找到了解决方案here

在命令前加HADOOP_CLIENT_OPTS="-Xmx1024M"可以解决问题。

最终的命令行是:

HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>"  -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"