读取map中的每一行并在reduce中组合以获得rmr2中的完整矩阵

时间:2014-11-26 14:51:29

标签: r hadoop mapreduce

Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")

library(rJava)

library(rmr2)
library(rhdfs)

hdfs.init()

data <- hdfs.read.text.file("..../Tab_20.csv") # Tab_20 contains 20 rows and 1000 cols 
s <- strsplit(data, split = ",")
S = as.matrix(s)
output <- matrix(unlist(S), ncol = 1000, byrow = TRUE)
output <- output[2:21,]
class(output) <- "numeric"
ints = 1:nrow(output)
combined_mat = mapreduce(input = ints, map = function(k, v) rowSums(v))

我希望每行组合一个矩阵以获得完整的矩阵。我得到的错误如下:

14/11/26 20:10:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
Required argument: -input <name>
Try -help for more information
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

我想分别阅读每一行的原因是以MapReduce方式查找rowSums。

有没有办法选择每一行并执行一些数学运算,比如计算每行的rowSums然后组合以在rmr2中以地图缩小方式获取列向量?

在mapreduce行中应该写什么来为每个映射器按行读取矩阵?

0 个答案:

没有答案