Question

我创建了一个基本上创建1000个二进制值的向量的函数。我已经能够使用rle计算连续1的最长条纹。

我想知道如何在这个更大的向量中找到一个特定的向量（比如说c(1,0,0,1)）？我希望它返回该向量的出现次数。因此c(1,0,0,1,1,0,0,1)应返回2，而c(1,0,0,0,1)应返回0.

我发现的大多数解决方案只是查找序列是否发生并返回TRUE或FALSE，或者它们为各个值提供结果，而不是指定的特定向量。

到目前为止，这是我的代码：

# creates a function where a 1000 people choose either up or down.
updown <- function(){
  n = 1000
  X = rep(0,n)
  Y = rbinom(n, 1, 1 / 2)
  X[Y == 1] = "up"
  X[Y == 0] = "down"

  #calculate the length of the longest streak of ups:
  Y1 <- rle(Y)
  streaks <- Y1$lengths[Y1$values == c(1)]
  max(streaks, na.rm=TRUE)
}

# repeat this process n times to find the average outcome.
longeststring <- replicate(1000, updown())
longeststring(p_vals)

Answer 1

这也有效：

library(stringr)
x <- c(1,0,0,1)
y <- c(1,0,0,1,1,0,0,1) 
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 2
y <- c(1,0,0,0,1)
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 0

如果您想匹配重叠的模式，

y <- c(1,0,0,1,0,0,1) # overlapped
length(unlist(gregexpr("(?=1001)",paste(y, collapse=''),perl=TRUE)))
[1] 2

Answer 2

由于Y仅为0和1 s，我们可以paste将其变为字符串并使用正则表达式，特别是gregexpr。简化了一下：

set.seed(47)    # for reproducibility

Y <- rbinom(1000, 1, 1 / 2)

count_pattern <- function(pattern, x){
    sum(gregexpr(paste(pattern, collapse = ''), 
                 paste(x, collapse = ''))[[1]] > 0)
}

count_pattern(c(1, 0, 0, 1), Y)
## [1] 59

paste缩小了模式，将Y缩减为字符串，例如这里的模式为"1001"，Y为1000个字符的字符串。 gregexpr在Y中搜索所有出现的模式，并返回匹配的索引（以及更多信息，以便可以提取它们，如果需要的话）。由于gregexpr将返回-1以便不匹配，因此测试大于0的数字将让我们简单地将TRUE值相加以得到macthes的数量;在这种情况下，59。

提到的其他案例：

count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1))
## [1] 2

count_pattern(c(1,0,0,1), c(1,0,0,0,1))
## [1] 0

如何在r中的向量中找到一个字符串？

2 个答案: