查找关联术语:findAssocs不返回结果

时间:2018-05-01 16:22:16

标签: r nlp

SELECT Answers.answer_id FROM Answers WHERE Answers.question_id = 2 ORDER BY Answers.answer_id ASC返回< 0行> (或0长度的row.names)我的所有输入,我不够聪明,弄清楚为什么。我已经发布了下面的代码,向您展示我的步骤,提前感谢!

读入文件

findAssocs

崩溃

text <- readLines(list.files())
text <- readLines(file.choose())
text <- pdf_text(file.choose())

转换为语料库

text <- paste(unlist(text), collapse ="")

清洁

docs <- Corpus(VectorSource(text))`

DTM

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("dutch"))
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)

关联条款

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)

SUBSET:

as.data.frame(findAssocs(dtm, terms = "security", corlimit = 0.3))

0 个答案:

没有答案