Question

我有一个逗号分隔整数字符串的字符矩阵：

> mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
> mat
     [,1] [,2]  [,3]   
[1,] NA   NA    "3"    
[2,] "1"  "2,1" "1,3,3"

我希望输出是一个数值数组，其中z索引表示矩阵中整数的计数：

, , 1

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   1    1    1 

, , 2

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   1    NA

, , 3

     [,1] [,2] [,3]
[1,]   NA   NA   1
[2,]   NA   NA   2

我怎么能做到这一点？

为了了解数据的规模，最终的阵列尺寸为~20,000 x 2,000 x 200，矩阵将是阵列的前两个维度（20,000 x 2,000）。

Answer 1

这使用循环，可能不是最有效的解决方案：

mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)

#split the strings
temp <- strsplit(mat, ",", fixed=TRUE)

#unique values
levels <- na.omit(unique(do.call(c, temp)))

#convert to factors and use table
temp <- t(sapply(temp, function(x) table(factor(x, levels=levels))))

#make it an array
array(temp, c(nrow(mat), ncol(mat), length(levels)))
# , , 1
# 
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    1
# 
# , , 2
# 
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    0    1    0
# 
# , , 3
# 
#      [,1] [,2] [,3]
# [1,]    0    0    1
# [2,]    0    0    2

编辑：

这可以避免在循环中应用table和factor，并且应该更快：

temp <- strsplit(mat, ",", fixed=TRUE)

id <- rep(seq_along(temp), sapply(temp, length))
temp <- factor(do.call(c, temp))
array(t(table(temp, id)), c(nrow(mat), ncol(mat), length(levels(temp))))

R：整数字符串到整数计数数组的矩阵

1 个答案:

编辑：