我想在R
中获取我的角色向量中的平均单词数one <- c(9, 23, 43)
two <- c("this is a new york times article.", "short article.", "he went outside to smoke a cigarette.")
mydf <- data.frame(one, two)
mydf
# one two
# 1 9 this is a new york times article.
# 2 23 short article.
# 3 43 he went outside to smoke a cigarette.
我正在寻找一个函数,它给出了字符向量“2”的平均单词数。
此处的输出应为5.3333(=(7 + 2 + 7)/ 3)
答案 0 :(得分:3)
这里有qdap包的可能性:
library(qdap)
wc(mydf$two, FALSE)/nrow(mydf)
## [1] 5.333333
这有点矫枉过正,但你也可以这样做:
word_stats(mydf$two)
## all n.sent n.words n.char n.syl n.poly wps cps sps psps cpw spw pspw n.state proDF2 n.hapax n.dis grow.rate prop.dis
## 1 all 3 16 68 23 3 5.333 22.667 7.667 1 4.250 1.438 .188 3 1 12 2 .750 .125
wps
列是每个句子的单词。
答案 1 :(得分:2)
Hadley Wickham的stringr包可能是最简单的方法:
library(stringr)
foo<- str_split(two, " ") # split each element of your vector by the space sign
sapply(foo,length) # just a quick test: how many words has each element?
sum(sapply(foo,length))/length(foo) # calculate sum and divide it by the length of your original object
[1] 5.333333
答案 2 :(得分:2)
我确信有一些更详细的方法可用,但您可以使用strsplit
将空格中的字符串拆分为字符向量并计算其元素长度。
mean(sapply(strsplit(as.character(mydf$two), "[[:space:]]+"), length))
# [1] 5.3333
答案 3 :(得分:2)
或gregexpr()
mean(sapply(mydf$two,function(x)length(unlist(gregexpr(" ",x)))+1))
[1] 5.333333