仅处理大型文件时hadoop 2.6流式传输出错

时间:2015-08-16 16:22:53

标签: python hadoop streaming

我在带有3节点集群的YARN env中使用带有python的Hadooop 2.6流。

我可以使用1,5或10 GB的数据文件成功运行mapreduce。 但是,当我给同一个mapreduce调用一个15或24 GB的数据文件时,它会在到达reduce阶段时失败并出现以下错误:

15/08/16 18:58:55 INFO mapreduce.Job:  map 69% reduce 20%
15/08/16 18:58:56 INFO mapreduce.Job: Task Id : attempt_1439307476930_0012_m_000094_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

stderr似乎没有任何帮助:

Aug 16, 2015 6:56:44 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Aug 16, 2015 6:56:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Aug 16, 2015 6:56:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Aug 16, 2015 6:56:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Aug 16, 2015 6:56:45 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Aug 16, 2015 6:56:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Aug 16, 2015 6:56:45 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Aug 16, 2015 6:56:46 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

这是我的hadoop命令:

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
-D stream.map.output.field.separator=, \
-D stream.num.map.output.key.fields=5 \
-D mapreduce.map.output.key.field.separator=, \
-D mapreduce.partition.keypartitioner.options=-k1,2 \
-D log4j.configuration=/usr/hadoop/hadoop-2.6.0/etc/hadoop/log4j.properties \
-file /usr/hadoop/code/sgw/mapper_sgw_lgi.py \
-mapper 'python mapper_sgw_lgi.py 172.27.64.10' \
-file /usr/hadoop/code/sgw/reducer_sgw_lgi.py \
-reducer 'python reducer_sgw_lgi.py' \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-input /input/172.27.64.10_sgw_1-150_06212015-nl.log \
-output output3

0 个答案:

没有答案