R:在字符矩阵中有效地计算列成员

时间:2018-04-19 16:57:46

标签: r

我有12个字符矩阵,长度从2到13行不等。

例如,有2行长度(我只是部分显示):

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
[1,] "N"  "N"  "S"  "S"  "F"  "F"  "C"  "S"  "C"  "S"   "U"   "N"   "S"   "S"   "S"   "S"  
[2,] "N"  "C"  "S"  "N"  "N"  "S"  "C"  "F"  "C"  "S"   "C"   "U"   "F"   "S"   "S"   "N"  
     [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] "S"   "S"   "N"   "S"   "S"   "U"   "S"   "C"   "C"   "S"   "C"   "S"   "S"   "S"  
[2,] "O"   "S"   "U"   "S"   "U"   "U"   "S"   "C"   "C"   "S"   "C"   "U"   "S"   "U"  

另一个例子(5行长)(部分显示):

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "F"  "S"  "U"  "C"  "U"  "S"  "S"  "N"  "S"  "N"  
[2,] "S"  "S"  "N"  "N"  "N"  "U"  "N"  "C"  "U"  "N"  
[3,] "S"  "S"  "C"  "S"  "N"  "S"  "S"  "C"  "N"  "C"  
[4,] "S"  "S"  "N"  "U"  "N"  "O"  "C"  "C"  "U"  "C"  
[5,] "N"  "O"  "O"  "U"  "N"  "O"  "U"  "C"  "C"  "C"  

最大行数为13

最大列数为1354

矩阵包含6个字母:

the_letters <- c("C","F","N","O","S","U")

我想计算:

有多少列包含(“N”,“N”)的序列?

有多少列包含(“F”,“S”,“S”,“N”)的序列?

the_letters的2到13个长度的其他可能组合

例如,(“N”,“S”),2长度 例如,(“N”,“S”,“O”),3长度 等

(注意序列重要和字母可以重复)。

如何有效地做到这一点?

1 个答案:

答案 0 :(得分:1)

示例数据:

set.seed(1) 
the_letters <- c("C","F","N","O","S","U")
rows <- 5
cols <- 10
(foo <- matrix(sample(the_letters, size = rows*cols, replace = TRUE), rows, cols))
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] "F"  "U"  "F"  "N"  "U"  "N"  "N"  "S"  "S"  "S"  
#> [2,] "N"  "U"  "F"  "S"  "F"  "C"  "O"  "S"  "O"  "C"  
#> [3,] "O"  "O"  "S"  "U"  "O"  "N"  "N"  "C"  "S"  "N"  
#> [4,] "U"  "O"  "N"  "N"  "C"  "U"  "F"  "S"  "O"  "S"  
#> [5,] "F"  "C"  "S"  "S"  "F"  "N"  "S"  "N"  "O"  "S"

一种方法是将列粘贴到字符串中,然后使用grepl查找子字符串。这是一个执行此操作的函数:

ncols_pattern <- function(x, pattern) {
  sum(grepl(pattern, apply(x, 2, paste0, collapse="")))
}

您可以将所需的模式指定为"NN"而不是c("N","N")

示例:

ncols_pattern(foo, "O")
#> [1] 5
ncols_pattern(foo, "UN")
#> [1] 2
ncols_pattern(foo, "OO")
#> [1] 2