计算R中开放式回复中的单词数

时间:2014-06-24 11:23:39

标签: regex r pattern-matching

我有一个包含600个响应的数据集,其中包含" Free_Text"包含受访者反馈/评论的变量。现在我想计算每个受访者评论中的单词数量。我该怎么办?我是R的新学员,正在研究R studio。

4 个答案:

答案 0 :(得分:2)

考虑使用stri_extract_words包中的stringi,特别是如果您有非英文文本。它使用ICU的BreakIterator执行此任务,并包含一系列复杂的分词规则。

library(stringi)
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.")
stri_extract_words(str)
## [[1]]
## [1] "How"   "many"  "words" "are"   "there"
## 
## [[2]]
##  [1] "R"                "язык"             "программирования" "для"              "статистической"  
##  [6] "обработки"        "данных"           "и"                "работы"           "с"               
## [11] "графикой"         "а"                "также"            "свободная"        "программная"     
## [16] "среда"            "вычислений"       "с"                "открытым"         "исходным"        
## [21] "кодом"            "в"                "рамках"           "проекта"          "GNU"   
sapply(stri_extract_words(str), length) # how many words are there in each character string?
## [1]  5 25

答案 1 :(得分:1)

拆分字符串并计算元素是一种让你入门的简单方法。

str = "This is a string."

str_length = length(strsplit(str," ")[[1]])

> str_length
[1] 4

答案 2 :(得分:1)

可能有帮助:

 str1 <- c("How many words are in this sentence","How many words")
 sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1
 #[1] 7 3

另外,

 library(qdap)
 word_count(str1)
#[1] 7 3

 str2 <- "How many words?."  
 word_count(str2)
 #[1] 3

答案 3 :(得分:0)

另外,还有一种方法,使用stringr包列出单个单词:

str1 <- c("How many words are in this sentence","How many words")
length(unlist(str_match_all(str1, "\\S+" ))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them