我正在运行以下Rscript gdp.R
#!/usr/bin/env Rscript
Sys.getenv(c("HADOOP_HOME", "HADOOP_CMD", "HADOOP_STREAMING", "HADOOP_CONF_DIR"))
library(rmr2)
library(rhdfs)
setwd("/root/somnath/GDP_data/")
gdp <- read.csv("GDP.csv")
head(gdp)
hdfs.init()
gdp.values <- to.dfs(gdp)
aaplRevenue = 156508
gdp.map.fn <- function(k,v) {
key <- ifelse(v[4] < aaplRevenue, "less", "greater")
keyval(key, 1)
}
count.reduce.fn <- function(k,v) {
keyval(k, length(v))
}
count <- mapreduce(input=gdp.values, map=gdp.map.fn, reduce=count.reduce.fn)
from.dfs(count)
$val
并且无法克服mapreduce函数中的以下错误:
流媒体命令失败! mr中的错误(map = map,reduce = reduce,combine = combine,vectorized.reduce,: hadoop流式传输失败,错误代码为1 电话:mapreduce - &gt;先生
[root@kkws029 RHadoop_scripts]# Rscript gdp.R
Loading required package: methods
Loading required package: rJava
HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
Be sure to run hdfs.init()
CountryCode Number CountryName GDP
1 USA 1 UnitedStates 168000
2 CHN 2 China 9240270
3 JPN 3 Japan 4901530
4 DEU 4 Germany 3634823
5 FRA 5 France 2734949
6 GBR 6 UnitedKingdom 2522261
14/07/08 16:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
14/07/08 16:57:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/07/08 16:57:02 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning messages:
1: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
packageJobJar: [/tmp/hadoop-root/hadoop-unjar4163972761639211537/] [] /tmp/streamjob5583656452134995821.jar tmpDir=null
14/07/08 16:57:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/08 16:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1
14/07/08 16:57:20 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
14/07/08 16:57:20 INFO streaming.StreamJob: Running job: job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:20 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:21 INFO streaming.StreamJob: map 0% reduce 0%
14/07/08 16:57:26 INFO streaming.StreamJob: map 50% reduce 0%
14/07/08 16:57:49 INFO streaming.StreamJob: map 100% reduce 100%
14/07/08 16:57:49 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:49 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:49 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:49 ERROR streaming.StreamJob: Job not successful. Error: NA
14/07/08 16:57:49 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Calls: mapreduce -> mr
In addition: Warning messages:
1: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
4: In rmr.options("backend.parameters") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
Execution halted
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
Warning messages:
1: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("backend") :
Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
我的stderr日志如下:
Loading objects:
.Random.seed
aaplRevenue
count.reduce.fn
gdp
gdp.map.fn
gdp.values
Loading objects:
backend.parameters
combine
combine.file
combine.line
debug
default.input.format
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
tempfile
vectorized.reduce
verbose
work.dir
Loading required package: rhdfs
Loading required package: methods
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("rhdfs", "rJava", "methods", "rmr2", "stats", "graphics", :
can't load rhdfs
Loading required package: rmr2
Loading objects:
backend.parameters
combine
combine.file
combine.line
debug
default.input.format
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
tempfile
vectorized.reduce
verbose
work.dir
Warning in Ops.factor(left, right) : < not meaningful for factors
Error in split.default(a.list, ceiling(seq_along(a.list)/every.so.many), :
first argument must be a vector
Calls: <Anonymous> ... <Anonymous> -> do.call -> mapply -> split -> split.default
No traceback available
Error during wrapup:
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
任何建议都将受到高度赞赏
由于 -S
注意:我已在stderr日志中阻止引用错误,其中指定找不到系统变量HADOOP_CMD。 有没有办法让HADOOP系统环境变量导出到R?另请注意,我在脚本的开头使用了Sys.getenv(c(&#34; HADOOP_HOME&#34;,...)),但这似乎不适用于stderr建议
请注意我已经在〜/ .bash_profile
中为HADOOP环境变量添加了以下导出命令# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export JRI_PATH=/usr/lib64/R
export R_HOME=/usr/lib64/R
export R_SHARE_DIR=/usr/share/R
#export JRI_LD_PATH=${R_HOME}/library/rJava/jri:${R_HOME}/lib:${R_HOME}/bin
export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export JAVA_HOME=/var/jdk1.7.0_25
#PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR
export PATH
答案 0 :(得分:0)
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
答案 1 :(得分:0)
count <- mapreduce(input=gdp.values,
map = gdp.map.fn,
reduce = count.reduce.fn,
#output = output,
input.format = "text",
combine = T)
我正在研究相同的代码...但没有得到我想要的答案......所以做了一些练习,通过在地图的每个黑色上应用每个东西reduce..最后我发现了作者的错误。 。 听到是修改代码.. 你可以运行它