我有12个字符矩阵,长度从2到13行不等。
例如,有2行长度(我只是部分显示):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
[1,] "N" "N" "S" "S" "F" "F" "C" "S" "C" "S" "U" "N" "S" "S" "S" "S"
[2,] "N" "C" "S" "N" "N" "S" "C" "F" "C" "S" "C" "U" "F" "S" "S" "N"
[,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] "S" "S" "N" "S" "S" "U" "S" "C" "C" "S" "C" "S" "S" "S"
[2,] "O" "S" "U" "S" "U" "U" "S" "C" "C" "S" "C" "U" "S" "U"
另一个例子(5行长)(部分显示):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "F" "S" "U" "C" "U" "S" "S" "N" "S" "N"
[2,] "S" "S" "N" "N" "N" "U" "N" "C" "U" "N"
[3,] "S" "S" "C" "S" "N" "S" "S" "C" "N" "C"
[4,] "S" "S" "N" "U" "N" "O" "C" "C" "U" "C"
[5,] "N" "O" "O" "U" "N" "O" "U" "C" "C" "C"
最大行数为13
最大列数为1354
矩阵包含6个字母:
the_letters <- c("C","F","N","O","S","U")
我想计算:
有多少列包含(“N”,“N”)的序列?
有多少列包含(“F”,“S”,“S”,“N”)的序列?
the_letters
的2到13个长度的其他可能组合
例如,(“N”,“S”),2长度 例如,(“N”,“S”,“O”),3长度 等
(注意序列重要和字母可以重复)。
如何有效地做到这一点?
答案 0 :(得分:1)
示例数据:
set.seed(1)
the_letters <- c("C","F","N","O","S","U")
rows <- 5
cols <- 10
(foo <- matrix(sample(the_letters, size = rows*cols, replace = TRUE), rows, cols))
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] "F" "U" "F" "N" "U" "N" "N" "S" "S" "S"
#> [2,] "N" "U" "F" "S" "F" "C" "O" "S" "O" "C"
#> [3,] "O" "O" "S" "U" "O" "N" "N" "C" "S" "N"
#> [4,] "U" "O" "N" "N" "C" "U" "F" "S" "O" "S"
#> [5,] "F" "C" "S" "S" "F" "N" "S" "N" "O" "S"
一种方法是将列粘贴到字符串中,然后使用grepl
查找子字符串。这是一个执行此操作的函数:
ncols_pattern <- function(x, pattern) {
sum(grepl(pattern, apply(x, 2, paste0, collapse="")))
}
您可以将所需的模式指定为"NN"
而不是c("N","N")
。
示例:
ncols_pattern(foo, "O")
#> [1] 5
ncols_pattern(foo, "UN")
#> [1] 2
ncols_pattern(foo, "OO")
#> [1] 2