计算R中文本向量中多次出现的单词

时间:2015-10-11 12:49:32

标签: r string

我有以下内容:

text <- c('I am a human','It is an animal and not a human, I am a human','Cant think of something else to write','and and is am')
words <- c('and','am','is')

我想计算文本中这些单词出现次数的总和。所以输出应该如下:

[1] 1 3 0 4

我使用的代码显然不是最优雅的代码:

TotalCount <- vector(mode='integer',length = 4)
for (ii in 1:4){
    for(jj in 1:3){
          wordCount <- str_count(text[ii],words[jj])
          TotalCount[ii] <- wordCount + TotalCount[ii]
    }
}

是否有更高效,更好的方式来做到这一点?

1 个答案:

答案 0 :(得分:1)

您可以使用str_count库中的stringr功能。

library(stringr)
text <- c('I am a human','It is an animal and not a human, I am a human','Cant think of something else to write','and and is am')
words <- c('and','am','is')
str_count(text, paste(words, collapse="|"))
# [1] 1 3 0 4

str_count(text, paste0(c("\\b("),paste(words,collapse="|"),c(")\\b")))