我通过以下代码使用RHadoop:
Sys.setenv(HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib/native")
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.0.0.jar")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64")
library(rJava)
library(rhdfs)
library(rmr2)
hdfs.init()
mapper = function (., X) {
n=nrow(X);
ones=matrix(rep(1,n),nrow=n,ncol=1);
ag=aggregate(cbind(ones,X[,1:79]),by=list(X[,80]),FUN="sum")
key=factor(ag[,1]);
keyval(key,split(ag[,-1],key))
}
reducer = function(k, A) {
keyval(k,list(Reduce('+', A)))
}
GroupSums <- from.dfs( mapreduce(input = "/ISCXFlowMeter.csv", map = mapper, reduce = reducer, combine = T))
当我运行此代码时,出现错误:
packageJobJar:[/ tmp / hadoop-unjar7138506441946536619 /] [] /tmp/streamjob6099552934186757596.jar tmpDir = null 2018-06-12 22:40:04,651 INFO client.RMProxy:连接到ResourceManager at /0.0.0.0:8032 2018-06-12 22:40:04,945 INFO client.RMProxy:正在连接 到资源管理器/0.0.0.0:8032 2018-06-12 22:40:05,201 INFO mapreduce.JobResourceUploader:禁用路径的擦除编码: /tmp/hadoop-yarn/staging/uel/.staging/job_1528838017005_0012 2018-06-12 22:40:06,158 INFO mapred.FileInputFormat:总输入文件 处理:1 2018-06-12 22:40:06,171 INFO net.NetworkTopology: 添加新节点:/default-rack/127.0.0.1:9866 2018-06-12 22:40:06,233 INFO mapreduce.JobSubmitter:分裂数:2 2018-06-12 22:40:06,348 INFO Configuration.deprecation: 不推荐使用yarn.resourcemanager.system-metrics-publisher.enabled。 相反,请使用yarn.system-metrics-publisher.enabled 2018-06-12 22:40:06,608 INFO mapreduce.JobSubmitter:提交作业的代币: job_1528838017005_0012 2018-06-12 22:40:06,610 INFO mapreduce.JobSubmitter:用令牌执行:[] 2018-06-12 22:40:06,945 INFO conf.Configuration:找不到resource-types.xml 2018-06-12 22:40:06,945 INFO resource.ResourceUtils:无法找到 “资源的types.xml”。 2018-06-12 22:40:07,022 INFO impl.YarnClientImpl:提交的应用程序 application_1528838017005_0012 2018-06-12 22:40:07,249 INFO mapreduce.Job:跟踪工作的网址: http://uel-Deskop-VM:8088/proxy/application_1528838017005_0012/ 2018-06-12 22:40:07,251 INFO mapreduce.Job:正在运行的工作: job_1528838017005_0012 2018-06-12 22:40:09,301 INFO mapreduce.Job:工作 job_1528838017005_0012以uber模式运行:false 2018-06-12 22:40:09,305 INFO mapreduce.Job:地图0%减少0%2018-06-12 22:40:09,337 INFO mapreduce.Job:工作职位_1528838017005_0012失败 状态为FAILED的原因是:申请申请_1528838017005_0012 由于AM Container的原因,失败了2次 appattempt_1528838017005_0012_000002退出exitCode:127 失败了。诊断:[2018-06-12 22:40:08.734]例外 从容器发射。容器ID: container_1528838017005_0012_02_000001退出代码:127
[2018-06-12 22:40:08.736]容器以非零退出代码退出 127.错误文件:prelaunch.err。最后4096字节的prelaunch.err:最后4096字节的stderr:/ bin / bash:/ bin / java:没有这样的文件或 目录
[2018-06-12 22:40:08.736]容器以非零退出代码退出 127.错误文件:prelaunch.err。最后4096字节的prelaunch.err:最后4096字节的stderr:/ bin / bash:/ bin / java:没有这样的文件或 目录
要获得更详细的输出,请查看应用程序跟踪页面: http://uel-Deskop-VM:8088/cluster/app/application_1528838017005_0012 然后单击每个尝试的日志链接。 。失败了 应用。 2018-06-12 22:40:09,368 INFO mapreduce.Job:专柜:0 2018-06-12 22:40:09,369 ERROR streaming.StreamJob:工作不成功! 流命令失败! mr中的错误(map = map,reduce = reduce, combine = combine,vectorized.reduce,:hadoop流失败 错误代码1 &GT;
hadoop中的ISCXFlowMeter.csv文件可在此处获取:https://www.dropbox.com/s/rbppzg6x2slzcjz/ISCXFlowMeter.csv?dl=1
您能指导我如何纠正这个问题吗?
答案 0 :(得分:0)
过了一会儿,通过在mapred-site.xml
中添加以下属性,我可以纠正错误。
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
但是,现在的问题是,完成map-reduce后键值为NULL。任何帮助,我很感激。