hadoop流式传输失败,错误代码1在RHadoop中

时间:2018-06-12 21:47:42

标签: r hadoop rhadoop

我通过以下代码使用RHadoop:

Sys.setenv(HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib/native")
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.0.0.jar")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64")

library(rJava)
library(rhdfs)
library(rmr2)
hdfs.init()

mapper = function (., X) {
  n=nrow(X);
  ones=matrix(rep(1,n),nrow=n,ncol=1);
  ag=aggregate(cbind(ones,X[,1:79]),by=list(X[,80]),FUN="sum")
  key=factor(ag[,1]);
  keyval(key,split(ag[,-1],key))
}

reducer = function(k, A) {
  keyval(k,list(Reduce('+', A)))
}

GroupSums <-  from.dfs( mapreduce(input = "/ISCXFlowMeter.csv", map = mapper, reduce = reducer, combine = T))

当我运行此代码时,出现错误:

  

packageJobJar:[/ tmp / hadoop-unjar7138506441946536619 /] []   /tmp/streamjob6099552934186757596.jar tmpDir = null 2018-06-12   22:40:04,651 INFO client.RMProxy:连接到ResourceManager at   /0.0.0.0:8032 2018-06-12 22:40:04,945 INFO client.RMProxy:正在连接   到资源管理器/0.0.0.0:8032 2018-06-12 22:40:05,201 INFO   mapreduce.JobResourceUploader:禁用路径的擦除编码:   /tmp/hadoop-yarn/staging/uel/.staging/job_1528838017005_0012   2018-06-12 22:40:06,158 INFO mapred.FileInputFormat:总输入文件   处理:1 2018-06-12 22:40:06,171 INFO net.NetworkTopology:   添加新节点:/default-rack/127.0.0.1:9866 2018-06-12   22:40:06,233 INFO mapreduce.JobSubmitter:分裂数:2   2018-06-12 22:40:06,348 INFO Configuration.deprecation:   不推荐使用yarn.resourcemanager.system-metrics-publisher.enabled。   相反,请使用yarn.system-metrics-publisher.enabled 2018-06-12   22:40:06,608 INFO mapreduce.JobSubmitter:提交作业的代币:   job_1528838017005_0012 2018-06-12 22:40:06,610 INFO   mapreduce.JobSubmitter:用令牌执行:[] 2018-06-12   22:40:06,945 INFO conf.Configuration:找不到resource-types.xml   2018-06-12 22:40:06,945 INFO resource.ResourceUtils:无法找到   “资源的types.xml”。 2018-06-12 22:40:07,022 INFO   impl.YarnClientImpl:提交的应用程序   application_1528838017005_0012 2018-06-12 22:40:07,249 INFO   mapreduce.Job:跟踪工作的网址:   http://uel-Deskop-VM:8088/proxy/application_1528838017005_0012/   2018-06-12 22:40:07,251 INFO mapreduce.Job:正在运行的工作:   job_1528838017005_0012 2018-06-12 22:40:09,301 INFO mapreduce.Job:工作   job_1528838017005_0012以uber模式运行:false 2018-06-12   22:40:09,305 INFO mapreduce.Job:地图0%减少0%2018-06-12   22:40:09,337 INFO mapreduce.Job:工作职位_1528838017005_0012失败   状态为FAILED的原因是:申请申请_1528838017005_0012   由于AM Container的原因,失败了2次   appattempt_1528838017005_0012_000002退出exitCode:127   失败了。诊断:[2018-06-12 22:40:08.734]例外   从容器发射。容器ID:   container_1528838017005_0012_02_000001退出代码:127

     

[2018-06-12 22:40:08.736]容器以非零退出代码退出   127.错误文件:prelaunch.err。最后4096字节的prelaunch.err:最后4096字节的stderr:/ bin / bash:/ bin / java:没有这样的文件或   目录

     

[2018-06-12 22:40:08.736]容器以非零退出代码退出   127.错误文件:prelaunch.err。最后4096字节的prelaunch.err:最后4096字节的stderr:/ bin / bash:/ bin / java:没有这样的文件或   目录

     

要获得更详细的输出,请查看应用程序跟踪页面:   http://uel-Deskop-VM:8088/cluster/app/application_1528838017005_0012   然后单击每个尝试的日志链接。 。失败了   应用。 2018-06-12 22:40:09,368 INFO mapreduce.Job:专柜:0   2018-06-12 22:40:09,369 ERROR streaming.StreamJob:工作不成功!   流命令失败! mr中的错误(map = map,reduce = reduce,   combine = combine,vectorized.reduce,:hadoop流失败   错误代码1   &GT;

hadoop中的ISCXFlowMeter.csv文件可在此处获取:https://www.dropbox.com/s/rbppzg6x2slzcjz/ISCXFlowMeter.csv?dl=1

您能指导我如何纠正这个问题吗?

1 个答案:

答案 0 :(得分:0)

过了一会儿,通过在mapred-site.xml中添加以下属性,我可以纠正错误。

<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

但是,现在的问题是,完成map-reduce后键值为NULL。任何帮助,我很感激。