我有一个逗号分隔整数字符串的字符矩阵:
> mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
> mat
[,1] [,2] [,3]
[1,] NA NA "3"
[2,] "1" "2,1" "1,3,3"
我希望输出是一个数值数组,其中z索引表示矩阵中整数的计数:
, , 1
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 1 1 1
, , 2
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA 1 NA
, , 3
[,1] [,2] [,3]
[1,] NA NA 1
[2,] NA NA 2
我怎么能做到这一点?
为了了解数据的规模,最终的阵列尺寸为~20,000 x 2,000 x 200,矩阵将是阵列的前两个维度(20,000 x 2,000)。
答案 0 :(得分:4)
这使用循环,可能不是最有效的解决方案:
mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
#split the strings
temp <- strsplit(mat, ",", fixed=TRUE)
#unique values
levels <- na.omit(unique(do.call(c, temp)))
#convert to factors and use table
temp <- t(sapply(temp, function(x) table(factor(x, levels=levels))))
#make it an array
array(temp, c(nrow(mat), ncol(mat), length(levels)))
# , , 1
#
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 1 1
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 0 1 0
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 0 0 1
# [2,] 0 0 2
这可以避免在循环中应用table
和factor
,并且应该更快:
temp <- strsplit(mat, ",", fixed=TRUE)
id <- rep(seq_along(temp), sapply(temp, length))
temp <- factor(do.call(c, temp))
array(t(table(temp, id)), c(nrow(mat), ncol(mat), length(levels(temp))))