Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")
library(rJava)
library(rmr2)
library(rhdfs)
hdfs.init()
data <- hdfs.read.text.file("..../Tab_20.csv") # Tab_20 contains 20 rows and 1000 cols
s <- strsplit(data, split = ",")
S = as.matrix(s)
output <- matrix(unlist(S), ncol = 1000, byrow = TRUE)
output <- output[2:21,]
class(output) <- "numeric"
ints = 1:nrow(output)
combined_mat = mapreduce(input = ints, map = function(k, v) rowSums(v))
我希望每行组合一个矩阵以获得完整的矩阵。我得到的错误如下:
14/11/26 20:10:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
Required argument: -input <name>
Try -help for more information
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
我想分别阅读每一行的原因是以MapReduce方式查找rowSums。
有没有办法选择每一行并执行一些数学运算,比如计算每行的rowSums然后组合以在rmr2中以地图缩小方式获取列向量?
在mapreduce行中应该写什么来为每个映射器按行读取矩阵?