R:整数字符串到整数计数数组的矩阵

时间:2013-11-18 11:59:40

标签: arrays r matrix

我有一个逗号分隔整数字符串的字符矩阵:

> mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
> mat
     [,1] [,2]  [,3]   
[1,] NA   NA    "3"    
[2,] "1"  "2,1" "1,3,3"

我希望输出是一个数值数组,其中z索引表示矩阵中整数的计数:

, , 1

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   1    1    1 

, , 2

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   1    NA

, , 3

     [,1] [,2] [,3]
[1,]   NA   NA   1
[2,]   NA   NA   2

我怎么能做到这一点?

为了了解数据的规模,最终的阵列尺寸为~20,000 x 2,000 x 200,矩阵将是阵列的前两个维度(20,000 x 2,000)。

1 个答案:

答案 0 :(得分:4)

这使用循环,可能不是最有效的解决方案:

mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)

#split the strings
temp <- strsplit(mat, ",", fixed=TRUE)

#unique values
levels <- na.omit(unique(do.call(c, temp)))

#convert to factors and use table
temp <- t(sapply(temp, function(x) table(factor(x, levels=levels))))

#make it an array
array(temp, c(nrow(mat), ncol(mat), length(levels)))
# , , 1
# 
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    1
# 
# , , 2
# 
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    0    1    0
# 
# , , 3
# 
#      [,1] [,2] [,3]
# [1,]    0    0    1
# [2,]    0    0    2

编辑:

这可以避免在循环中应用tablefactor,并且应该更快:

temp <- strsplit(mat, ",", fixed=TRUE)

id <- rep(seq_along(temp), sapply(temp, length))
temp <- factor(do.call(c, temp))
array(t(table(temp, id)), c(nrow(mat), ncol(mat), length(levels(temp))))