" hadoop流式传输失败,错误代码为5"

时间:2015-05-24 19:46:28

标签: r hadoop rhadoop

我使用我的两台笔记本电脑创建了一个多节点hadoop集群,并成功测试了它。 之后我在hadoop环境中安装了RHadoop。安装所有必需的软件包并设置路径变量。

然后,尝试按如下方式运行wordcount示例:

map <- function(k,lines) {

   words.list <- strsplit(lines, "\\s")

   words <- unlist(words.list)

   return(keyval(words, 1))

}

reduce <- function(word, counts) {

 keyval(word, sum(counts))

}

wordcount <- function(input, output = NULL) {

   mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce)

}

hdfs.root <- "wordcount"
hdfs.data <- file.path(hdfs.root, "data")
hdfs.out <- file.path(hdfs.root, "out")
out <- wordcount(hdfs.data, hdfs.out)

我收到以下错误:

15/05/24 21:09:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/05/24 21:09:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/05/24 21:09:20 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with     processName=JobTracker, sessionId= - already initialized
15/05/24 21:09:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area     file:/app/hadoop/tmp/mapred/staging/master91618435/.staging/job_local91618435_0001
15/05/24 21:09:21 ERROR streaming.StreamJob: Error Launching job : No such file or directory
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 5
Called from: mapreduce(input = input, output = output, input.format = "text", 
    map = map, reduce = reduce)

在运行此操作之前,我创建了两个hdfs文件夹wordcount/datawordcount/out,并使用comman行将一些文本上传到第一个文件夹。

另一个问题是:我的计算机上有两个用户:hdusermaster。第一个是为hadoop安装创建的。我想当我打开R / RStudio时,我将其作为master运行,并且因为为hduser创建了hadoop,所以存在一些导致此错误的权限问题。当人们可以在输出的第4行读取时,系统会尝试查找master91618435,我怀疑这应该是hduser...

我的问题是,如何摆脱这个错误?

P.S。:here是一个类似的问题,但对我没有任何有用的答案

0 个答案:

没有答案