Question

我有以下内容：

text <- c('I am a human','It is an animal and not a human, I am a human','Cant think of something else to write','and and is am')
words <- c('and','am','is')

我想计算文本中这些单词出现次数的总和。所以输出应该如下：

[1] 1 3 0 4

我使用的代码显然不是最优雅的代码：

TotalCount <- vector(mode='integer',length = 4)
for (ii in 1:4){
    for(jj in 1:3){
          wordCount <- str_count(text[ii],words[jj])
          TotalCount[ii] <- wordCount + TotalCount[ii]
    }
}

是否有更高效，更好的方式来做到这一点？

Answer 1

您可以使用str_count库中的stringr功能。

library(stringr)
text <- c('I am a human','It is an animal and not a human, I am a human','Cant think of something else to write','and and is am')
words <- c('and','am','is')
str_count(text, paste(words, collapse="|"))
# [1] 1 3 0 4

或

str_count(text, paste0(c("\\b("),paste(words,collapse="|"),c(")\\b")))

计算R中文本向量中多次出现的单词

1 个答案: