使用tm包和dtm / wordclouds抛出新错误

时间:2017-07-26 18:21:07

标签: r tm word-cloud

使用R(3.2.5)并加载以下包 'SnowballC','tm','NLP','RWeka','RTextTools','wordcloud','fpc'

carmenCorpus <- Corpus(VectorSource(feedback$Description))
carmenCorpus <- tm_map(carmenCorpus, PlainTextDocument)
carmenCorpus <- tm_map(carmenCorpus, removePunctuation)
carmenCorpus <- tm_map(carmenCorpus, removeWords, stopwords('english'))
carmenCorpus <- tm_map(carmenCorpus, stemDocument)

当我创建wordcloud时,我收到以下错误。这是一个新错误,几个月前代码运行时没有问题:

wordcloud(carmenCorpus, max.words = 100, random.order = FALSE)

# Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus),  : 
#  'i, j' invalid

请就此问题提出建议。

1 个答案:

答案 0 :(得分:0)

wordcloud不能仅仅使用语料库并且神奇地生成一个wordcloud。

你必须努力将其转换为TextDocumentMatrix然后总结单词频率:

# convert to TDM
tdm <- TermDocumentMatrix(carmenCorpus, control=list(stemming=True))

# calculate word frequencies
freqs = sort(rowSums(as.matrix(tdm)), decreasing=TRUE)

# plot wordcloud
wordcloud(names(freqs), freqs,
    max.words = 100,
    random.order = FALSE,
    # any other params you want to pass into wordcloud
    )