我想在包含长字符串的数据帧列中计算多个模式匹配。
pattern<-c("AAA", "BBB", "CCC")
df$AAA <- str_count(df$string_1, "AAA+")
df$BBB <- str_count(df$string_1, "BBB+")
df$CCC <- str_count(df$string_1, "CCC+")
df$AAA <- str_count(df$string_2, "AAA+")
df$BBB <- str_count(df$string_2, "BBB+")
df$CCC <- str_count(df$string_2, "CCC+")
...
实际上,列表&#34;模式&#34;要长得多,所以需要在字符串上使用循环。
答案 0 :(得分:2)
您可以使用sapply
或#DATA
pattern<-c("AAA", "BBB", "CCC")
set.seed(42)
df = data.frame(a = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")),
b = replicate(5, paste(sample(c("A", "B", "C"), 50, TRUE), collapse = "")))
library(stringr)
setNames(lapply(pattern, function(x) sapply(df, function(y)
str_count(string = y, pattern = x))), pattern)
#$AAA
# a b
#[1,] 0 0
#[2,] 2 1
#[3,] 0 2
#[4,] 4 1
#[5,] 2 2
#$BBB
# a b
#[1,] 1 2
#[2,] 1 0
#[3,] 2 3
#[4,] 1 2
#[5,] 2 1
#$CCC
# a b
#[1,] 1 0
#[2,] 2 1
#[3,] 2 0
#[4,] 2 0
#[5,] 0 1
{{1}}