Question

我见过几个关于在removewords R包中使用tm_map函数的问题，以便从语料库中删除stopwords()或硬编码的单词。但是，我试图删除存储在文件中的单词（目前是csv，但我不关心哪种类型）。使用下面的代码，我没有任何错误，但我的话仍然存在。有人可以解释一下是什么问题吗？

#install.packages('tm')
library(tm)

setwd("c://Users//towens101317//Desktop")

problem_statements <- read.csv("query_export_results_100.csv", stringsAsFactors = FALSE, header = TRUE)
problem_statements_text <- paste(problem_statements, collapse=" ")
problem_statements_source <- VectorSource(problem_statements_text)

my_stop_words <- read.csv("mystopwords.csv", stringsAsFactors=FALSE, header = TRUE)
my_stop_words_text <- paste(my_stop_words, collapse=" ")

corpus <- Corpus(problem_statements_source)
corpus <- tm_map(corpus, removeWords, my_stop_words_text)

dtm <- DocumentTermMatrix(corpus)
dtm2 <- as.matrix(dtm)

frequency <- colSums(dtm2)
frequency <- sort(frequency, decreasing=TRUE)

head(frequency)

在R中的tm_map中使用RemoveWords从文件加载的单词

0 个答案: