将带有列表的矩阵转换为稀疏矩阵

时间:2016-11-29 15:31:19

标签: r matrix sparse-matrix

我试图将包含列表(包含可变长度的元素)的矩阵转换为稀疏矩阵。这是一个玩具示例:

mOrig = matrix(
    c(rep(c('a_b', 'X'), 3),
    rep(c('a_b_e', 'X'), 2),
    rep(c('a_b_f', 'X'), 1),
    rep(c('c_d', 'Y'), 3),
    rep(c('c_d_e', 'Y'), 2),
    rep(c('c_d_f', 'Y'), 1)),
    ncol=2, byrow=TRUE
)
colnames(mOrig) = c('in', 'out')
mOrig

      in      out
 [1,] "a_b"   "X"
 [2,] "a_b"   "X"
 [3,] "a_b"   "X"
 [4,] "a_b_e" "X"
 [5,] "a_b_e" "X"
 [6,] "a_b_f" "X"
 [7,] "c_d"   "Y"
 [8,] "c_d"   "Y"
 [9,] "c_d"   "Y"
[10,] "c_d_e" "Y"
[11,] "c_d_e" "Y"
[12,] "c_d_f" "Y"

输出矩阵应如下所示:

      a b c d e f X Y
 [1,] 1 1 0 0 0 0 1 0
 [2,] 1 1 0 0 0 0 1 0
 [3,] 1 1 0 0 0 0 1 0
 [4,] 1 1 0 0 1 0 1 0
 [5,] 1 1 0 0 1 0 1 0
 [6,] 1 1 0 0 0 1 1 0
 [7,] 0 0 1 1 0 0 0 1
 [8,] 0 0 1 1 0 0 0 1
 [9,] 0 0 1 1 0 0 0 1
[10,] 0 0 1 1 1 0 0 1
[11,] 0 0 1 1 1 0 0 1
[12,] 0 0 1 1 0 1 0 1

我接近解决方案,但现在使用unique(unlist(strsplit()))for循环看起来效率极低。是否有人知道一些有效的解决方案,例如,利用sparseMatrix(或来自sparse.model.matrix包的Matrix)?

非常感谢!

1 个答案:

答案 0 :(得分:0)

写入稀疏矩阵的最快方法之一似乎是使用myMatrix[matrix] <- value形式。这在下面使用,以及lapply和strsplit。

library(Matrix)

mx <- Matrix(0,12,8, dimnames = list(NULL, c(letters[1:6], LETTERS[24:25])))

mOrig_split <- strsplit(mOrig[,'in'], '_')

long_fm <- do.call(rbind, lapply(seq_along(mOrig_split), function(x) {
  cbind(x,c(mOrig_split[[x]], mOrig[x,2]))}))

mx[cbind(as.numeric(long_fm[,1]), match(long_fm[,2], colnames(mx)))] <- 1

mx

预先进行匹配可能会稍微快一些,从而节省了从数字转换为字符和返回的转换:

mx <- Matrix(0,12,8, dimnames = list(NULL, c(letters[1:6], LETTERS[24:25])))

mOrig_split <- lapply(strsplit(mOrig[,'in'], '_'), match, colnames(mx))
mOrig_out <- match(mOrig[,2], colnames(mx))

long_fm <- do.call(rbind, lapply(seq_along(mOrig_split), function(x) {
  cbind(x,c(mOrig_split[[x]], mOrig_out[x]))}))

mx[long_fm] <- 1