我正在使用compare.cloud()包而不是动态提供的数据,这很好。然后经常发生一些语料库'元素(文档)是空的。这使compare.cloud返回错误。
到目前为止,我已经创建了一个不包含空子集的循环:
quests <- c("Q1","Q2","Q3")
wordslist <- list()
rmlist <- numeric()
count <- 1
for (quest in quests) {
wordslist[[quest]] <- subset(df$element, df$question == quest)
if (length(wordslist[[quest]]) == 0) {
rmlist <- c(rmlist, count)
wordslist[quest] <- NULL
}
count <- 1 + count
}
然后在构建语料库后,我使用计数器删除了这些名称:
tra$question <- tra$question[c(1,2,3)]
if (length(rmlist) != 0) {tra$question <- tra$question[-rmlist]}
所以这有效,但我想问一下它是否可以以某种方式得到改进,特别是如果有任何参数我可以提供给比较云来帮助达到这个目的。此外,这种行为是比较云中的错误吗?
我的脚本中的结果是,如果问题没有答案,这个问题将不会显示在wordcloud中。有一个空问题(问题标题带有wordcloud的空白部分)比要删除它更为可取。顺便说一句,我试图添加:
"empty_Qn"
字符,结果:它位于中心,丑陋,大,
引起所有注意"empty_Qn"
但重复被清除
tm(),我们再次得到上面的丑陋输出""
或" "
但是这里又被剥离了,并且
由于空语料库,我在比较云中再次出现此错误
一部分。编辑:整个剧本
## Elements per question wordcloud
library(wordcloud)
library(tm)
load(file="mygroupAnswers.Rda")
df <- df[df$group == groupname,]
df <- droplevels(df)
# Stop if no data
if(length(df$elements)==0) q(save="no")
quests <- c("Q1","Q2","Q3")
wordslist <- list()
rmlist <- numeric()
count <- 1
for (quest in quests) {
wordslist[[quest]] <- subset(df$element, df$question == quest)
if (length(wordslist[[quest]]) == 0) {
rmlist <- c(rmlist, count)
wordslist[quest] <- NULL
}
count <- 1 + count
}
corpus <- Corpus(VectorSource(wordslist), readerControl = list(language = lang)) # Live Web version
#corpus <- Corpus(VectorSource(wordALL), readerControl = list(language = "fr")) # RStudio version
corpus <- tm_map(corpus, content_transformer(tolower))
spStopWords <- c(stopwords(tra2), "l'", "j'", "d'", "c'", "qu'", "quand", "avoir", "être", "etre", "quelqu'un", "plus", "tant", "bien", "mal") # Live Web version
spStopWords <- c(stopwords("french"), "l'", "j'", "d'", "c'", "qu'", "quand", "avoir", "être", "etre", "quelqu'un", "plus", "tant", "bien", "mal") # RStudio version
corpus <- tm_map(corpus, removeWords, spStopWords)
corpus <- tm_map(corpus, removePunctuation)
tdm <- TermDocumentMatrix(corpus)
tdm <- as.matrix(tdm)
tra$question <- tra$question[c(1,2,3)]
if (length(rmlist) != 0) {tra$question <- tra$question[-rmlist]}
colnames(tdm) <- tra$question
comparison.cloud(tdm, max.words=800, scale=c(6,1), title.size=1.2)