为什么我不能使用" TermDocumentMatrix"?

时间:2017-08-16 10:38:18

标签: r matrix text-mining tm

为什么我不能使用" TermDocumentMatrix"?

我使用以下命令以单数形式统一复数词,但是我收到错误。

crudeCorp <- tm_map(crudeCorp, gsub, pattern = "smells", replacement = "smell")
crudeCorp <- tm_map(crudeCorp, gsub, pattern = "feels", replacement = "feel")
crudeDtm <- TermDocumentMatrix(crudeCorp, control=list(removePunctuation=T))
Error in UseMethod("meta", x) : 
  no applicable method for 'meta' applied to an object of class "character"

我该如何解决? 1.是否有从单一变为清洁的命令? 这个命令我用错了吗?

我将以下代码附加到句子处理和矩阵。

library(tm)
library(XML)

crudeCorp<-VCorpus(VectorSource(readLines(file.choose())))

#(Eliminating Extra Whitespace) 
crudeCorp <- tm_map(crudeCorp, stripWhitespace)

#(Convert to Lower Case)

crudeCorp<-tm_map(crudeCorp, content_transformer(tolower))


# remove stopwords from corpus

crudeCorp<-tm_map(crudeCorp, removeWords, stopwords("english"))
myStopwords <- c(stopwords("english"), "can", "will","got","also","goes","get","much","since","way","even")
myStopwords <- setdiff(myStopwords, c("will","can"))
crudeCorp <- tm_map(crudeCorp, removeWords, myStopwords)

crudeCorp<-tm_map(crudeCorp,removeNumbers)

crudeCorp <- tm_map(crudeCorp, gsub, pattern = "smells", replacement = "smell")
crudeCorp <- tm_map(crudeCorp, gsub, pattern = "feels", replacement = "feel")

#-(Creating Term-Document Matrices)
crudeDtm <- TermDocumentMatrix(crudeCorp, control=list(removePunctuation=T))

示例:我的数据

1. I'M HAPPY
2. how are you?
3. This apple is good
(skip)

1 个答案:

答案 0 :(得分:0)

不要使用以下代码来阻止&amp;标点删除?

crudeCorp <- tm_map(crudeCorp, removePunctuation)
crudeCorp <- tm_map(crudeCorp, stemDocument, language = "english")  
crudeDtm  <- DocumentTermMatrix(crudeCorp)

希望这有帮助!