R compare.cloud wordcloud错误时空语料库

时间:2016-10-03 04:21:18

标签: r word-cloud

我正在使用compare.cloud()包而不是动态提供的数据,这很好。然后经常发生一些语料库'元素(文档)是空的。这使compare.cloud返回错误。

到目前为止,我已经创建了一个不包含空子集的循环:

quests <- c("Q1","Q2","Q3")
wordslist <- list()
rmlist <- numeric()
count <- 1

for (quest in quests) {
  wordslist[[quest]] <- subset(df$element, df$question == quest)
  if (length(wordslist[[quest]]) == 0) {
    rmlist <- c(rmlist, count)
    wordslist[quest] <- NULL
  }
  count <- 1 + count
  }

然后在构建语料库后,我使用计数器删除了这些名称:

tra$question <- tra$question[c(1,2,3)]
if (length(rmlist) != 0) {tra$question <- tra$question[-rmlist]}

所以这有效,但我想问一下它是否可以以某种方式得到改进,特别是如果有任何参数我可以提供给比较云来帮助达到这个目的。此外,这种行为是比较云中的错误吗?

我的脚本中的结果是,如果问题没有答案,这个问题将不会显示在wordcloud中。有一个空问题(问题标题带有wordcloud的空白部分)比要删除它更为可取。顺便说一句,我试图添加:

  • 一个"empty_Qn"字符,结果:它位于中心,丑陋,大, 引起所有注意
  • 重复"empty_Qn"但重复被清除 tm(),我们再次得到上面的丑陋输出
  • 一个空字符串""" "但是这里又被剥离了,并且 由于空语料库,我在比较云中再次出现此错误 一部分。

编辑:整个剧本

## Elements per question wordcloud

library(wordcloud)
library(tm)

load(file="mygroupAnswers.Rda")

df <- df[df$group == groupname,]
df <- droplevels(df)

# Stop if no data
if(length(df$elements)==0) q(save="no")

quests <- c("Q1","Q2","Q3")
wordslist <- list()
rmlist <- numeric()
count <- 1

for (quest in quests) {
  wordslist[[quest]] <- subset(df$element, df$question == quest)
  if (length(wordslist[[quest]]) == 0) {
    rmlist <- c(rmlist, count)
    wordslist[quest] <- NULL
  }
  count <- 1 + count
  }

corpus <- Corpus(VectorSource(wordslist), readerControl = list(language = lang)) # Live Web version
#corpus <- Corpus(VectorSource(wordALL), readerControl = list(language = "fr")) # RStudio version
corpus <- tm_map(corpus, content_transformer(tolower))

spStopWords <- c(stopwords(tra2), "l'", "j'", "d'", "c'", "qu'", "quand", "avoir", "être", "etre", "quelqu'un", "plus", "tant", "bien", "mal") # Live Web version
spStopWords <- c(stopwords("french"), "l'", "j'", "d'", "c'", "qu'", "quand", "avoir", "être", "etre", "quelqu'un", "plus", "tant", "bien", "mal") # RStudio version
corpus <- tm_map(corpus, removeWords, spStopWords)
corpus <- tm_map(corpus, removePunctuation)

tdm <- TermDocumentMatrix(corpus)
tdm <- as.matrix(tdm)
tra$question <- tra$question[c(1,2,3)]
if (length(rmlist) != 0) {tra$question <- tra$question[-rmlist]}
colnames(tdm) <- tra$question

comparison.cloud(tdm, max.words=800, scale=c(6,1), title.size=1.2)

0 个答案:

没有答案