提前道歉,因为他是一个新手,也许会问一些愚蠢的问题。 我在单机群集(Ubuntu 14.04)上安装了Hadoop,并成功测试了Apache安装指南中指定的基本程序。随后我安装了R,RStudio和包rhdfs,rmr2以及所有依赖项。
然后我尝试运行以下程序:
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar")
library('rhdfs')
library('rmr2')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
input = small.ints,
map = function(k, v)
{
lapply(seq_along(v), function(r){
x <- runif(v[[r]])
keyval(r,c(max(x),min(x)))
})})
作业失败,控制台上的输出如下
packageJobJar: [/tmp/RtmprPBBS1/rmr-local-env242520fb4125, /tmp/RtmprPBBS1/rmr-global-env24252518202b, /tmp/RtmprPBBS1/rmr-streaming-map24255b97931e, /tmp/hadoop-hduser/hadoop-unjar4430970496737933525/] [] /tmp/streamjob6651310557292596411.jar tmpDir=null
14/05/05 09:16:08 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/05 09:16:08 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hduser/mapred/local]
14/05/05 09:16:08 INFO streaming.StreamJob: Running job: job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:08 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:09 INFO streaming.StreamJob: map 0% reduce 0%
14/05/05 09:16:41 INFO streaming.StreamJob: map 100% reduce 100%
14/05/05 09:16:41 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:41 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:41 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:41 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201405050557_0013_m_000001
14/05/05 09:16:41 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
stderror日志如下
Error in library(functional) : there is no package called ‘functional’
No traceback available
Error during wrapup:
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
我尝试过其他一些简单的演示程序,结果是一样的。所以问题似乎在于我的配置。
'functional'软件包已经安装并自动加载。即使手动加载它也无济于事。所以这很可能不是问题。
任何帮助或建议都会让我感激不尽。
我在Ubuntu 14.04上以单集群模式运行Hadoop 1.2.1,R 3.0.5和RStudio 0.98.507 Java是Oracle 7 Java版本1.7.0_55
Hadoop安装似乎没问题,因为我的常规wordcount程序工作正常。
即使是最简单的RHadoop demo
,我也能得到相同的结果这可能是我机器容量的问题吗?在略高端的笔记本电脑上运行? 2.8 GiB内存和Intel®Core™i3-2310M CPU @ 2.10GHz×4处理器
我现在已经转移到Hadoop 2.2.0并设法使用此tutorial安装相同的内容。用于计算PI的演示程序没有错误地执行。
然后我执行了这个非常简单的MR程序
Sys.setenv(HADOOP_CMD="/usr/local/hadoop220/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar")
library('rhdfs')
library('rmr2')
library('functional')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
程序执行到第7行,但在所有重要的MR步骤中失败并出现以下错误[仅显示错误的最后部分]
14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:37 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/06 13:53:37 INFO mapreduce.JobSubmitter: number of splits:2
14/05/06 13:53:37 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/05/06 13:53:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399363749415_0002
14/05/06 13:53:38 INFO impl.YarnClientImpl: Submitted application application_1399363749415_0002 to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:38 INFO mapreduce.Job: The url to track the job: http://yantrajaal:8088/proxy/application_1399363749415_0002/
14/05/06 13:53:38 INFO mapreduce.Job: Running job: job_1399363749415_0002
14/05/06 13:53:45 INFO mapreduce.Job: Job job_1399363749415_0002 running in uber mode : false
14/05/06 13:53:45 INFO mapreduce.Job: map 0% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job: map 100% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job: Task Id : attempt_1399363749415_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
14/05/06 13:54:31 INFO mapreduce.Job: map 100% reduce 0%
14/05/06 13:54:32 INFO mapreduce.Job: Job job_1399363749415_0002 failed with state FAILED due to: Task failed task_1399363749415_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/05/06 13:54:32 INFO mapreduce.Job: Counters: 10
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=72476
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/05/06 13:54:32 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
真的在我的智慧结束了下一步该做什么!
任何关于前进方向的建议都将被感激地接受和承认。我怀疑RHadoop可能还不熟悉Ubuntu 14.04,但这是猜测
答案 0 :(得分:4)
启动终端并使用
以超级用户或root身份登录sudo su root
然后在终端中start R
并使用以下命令安装rhadoop软件包
install.packages(c(&#34; codetools&#34;,&#34; R&#34;,&#34; Rcpp&#34;,&#34; RJSONIO&#34;,&# 34; bitops&#34 ;, &#34; digest&#34;,&#34; functional&#34;,&#34; stringr&#34;,&#34; plyr&#34;,&#34; reshape2&#34;,&#34; rJava& #34)) install.packages(C(&#34; dplyr&#34;&#34; R.methodsS3&#34)) install.packages(c(&#34; Hmisc&#34;))install.packages(c(&#34; caTools&#34;)) Sys.setenv(HADOOP_HOME =&#34;在/ usr /本地/ hadoop的&#34) Sys.setenv(HADOOP_CMD =&#34;在/ usr /本地/ hadoop的/ bin中/ hadoop的&#34)
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoopversiomentionhere.jar")
之后安装rmr2 rhdfs2
here
之后使用此命令安装这些下载的源文件
install.packages(path_to_file, repos = NULL, type="source")
现在安装后关闭终端R然后打开终端
rstudio
运行R代码的流错误将被解决为
上述步骤将在全局文件夹中安装R库。
可选择如果你想要你可以安装R本身是一个超级用户,以便在更安全的一方希望这有帮助
答案 1 :(得分:1)
单机群上的R设置似乎是错误的 R包functional是否安装在群集上?
答案 2 :(得分:1)
我用下面的方法解决了与你类似的问题。
查看您的R库
.libPaths()
使用以下命令检查安装了哪个库软件包:
system.file(package="functional")
如果它安装在个人库中,而不是在所有用户共有的库中,则作业将失败并显示错误,指出无法加载包。
希望这会有所帮助。
干杯
赵延昌答案 3 :(得分:0)
问题是因为当您以非root用户身份安装软件包时,它们最终会进入私有目录。这是所有问题的原因。解决方案是以root用户或超级用户身份登录,然后安装软件包,使它们最终进入系统范围的R库,在我的情况下是/ usr / lib64 / R / library之后,没有更多的问题。程序会起作用!