我试图用r来获取csv文件中每个单词的出现次数。 我的数据集如下所示:
TITLE
1 My first Android app after a year
2 Unmanned drone buzzes French police car
3 Make anything editable with HTML5
4 Predictive vs Reactive control
5 What was it like to move to San Antonio and go through TechStars Cloud?
6 Health-care sector vulnerable to hackers, researchers say
我尝试使用“黑客机器学习”中使用的功能:
get.tdm <- function(doc.vec) {
doc.corpus <- Corpus(VectorSource(doc.vec))
control <- list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, minDocFreq=2)
doc.dtm <- TermDocumentMatrix(doc.corpus, control)
return(doc.dtm)
}
但我得到一个我不理解的错误:
Error: is.Source(s) is not TRUE
In addition: Warning message:
In is.Source(s) : vectorized sources must have a positive length entry
可能出现什么问题?
答案 0 :(得分:1)
这对我有用(调用您的数据框df
)
library(tm)
doc.corpus <- Corpus(VectorSource(df))
freq <- data.frame(count=termFreq(doc.corpus[[1]]))
freq
# count
# after 1
# and 1
# android 1
# antonio 1
# anything 1
# ...
# unmanned 1
# vulnerable 1
# was 1
# what 1
# with 1
# year 1