我有一个包含600个响应的数据集,其中包含" Free_Text"包含受访者反馈/评论的变量。现在我想计算每个受访者评论中的单词数量。我该怎么办?我是R的新学员,正在研究R studio。
答案 0 :(得分:2)
考虑使用stri_extract_words
包中的stringi
,特别是如果您有非英文文本。它使用ICU的BreakIterator执行此任务,并包含一系列复杂的分词规则。
library(stringi)
str <- c("How many words are there?", "R — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта GNU.")
stri_extract_words(str)
## [[1]]
## [1] "How" "many" "words" "are" "there"
##
## [[2]]
## [1] "R" "язык" "программирования" "для" "статистической"
## [6] "обработки" "данных" "и" "работы" "с"
## [11] "графикой" "а" "также" "свободная" "программная"
## [16] "среда" "вычислений" "с" "открытым" "исходным"
## [21] "кодом" "в" "рамках" "проекта" "GNU"
sapply(stri_extract_words(str), length) # how many words are there in each character string?
## [1] 5 25
答案 1 :(得分:1)
拆分字符串并计算元素是一种让你入门的简单方法。
str = "This is a string."
str_length = length(strsplit(str," ")[[1]])
> str_length
[1] 4
答案 2 :(得分:1)
可能有帮助:
str1 <- c("How many words are in this sentence","How many words")
sapply(gregexpr("\\W+", gsub("[[:punct:]]+","",str1)), length) + 1
#[1] 7 3
另外,
library(qdap)
word_count(str1)
#[1] 7 3
str2 <- "How many words?."
word_count(str2)
#[1] 3
答案 3 :(得分:0)
另外,还有一种方法,使用stringr包列出单个单词:
str1 <- c("How many words are in this sentence","How many words")
length(unlist(str_match_all(str1, "\\S+" ))) # list all words -- strings that end with one or more white spaces, then unlist them so that the length function counts them