在R中的Wordcloud中将所有单词设为大写

时间:2016-11-17 11:28:20

标签: r text-mining tm word-cloud

创建Wordcloud时,最常见的是将所有单词设为小写。但是,我希望wordclouds显示单词大写。在将单词强制为大写后,wordcloud仍然显示小写单词。有什么想法吗?

可重复使用的代码:

    library(tm)
    library(wordcloud)

data <- data.frame(text = c("Creativity is the art of being ‘productive’ by using
          the available resources in a skillful manner. 
          Scientifically speaking, creativity is part of
          our consciousness and we can be creative –
          if we know – ’what goes on in our mind during
          the process of creation’.
          Let us now look at 6 examples of creativity which blows the mind."))

text <- paste(data$text, collapse = " ")

# I am using toupper() to force the words to become uppercase.
text <- toupper(text)

source <- VectorSource(text)
corpus <- VCorpus(source, list(language = "en"))

# This is my function for cleaning the text                  
clean_corpus <- function(corpus){
             corpus <- tm_map(corpus, removePunctuation)
             corpus <- tm_map(corpus, removeNumbers)
             corpus <- tm_map(corpus, stripWhitespace)
             corpus <- tm_map(corpus, removeWords, c(stopwords("en")))
             return(corpus)
}   

clean_corp <- clean_corpus(corpus)
data_tdm <- TermDocumentMatrix(clean_corp)
data_m <- as.matrix(data_tdm)

commonality.cloud(data_m, colors = c("#224768", "#ffc000"), max.words = 50)

这产生以下输出

enter image description here

1 个答案:

答案 0 :(得分:5)

这是因为幕后TermDocumentMatrix(clean_corp)正在进行TermDocumentMatrix(clean_corp, control = list(tolower = TRUE))。如果将其设置为TermDocumentMatrix(clean_corp, control = list(tolower = FALSE)),则单词将保持大写。或者,您也可以在之后调整矩阵的行名称:rownames(data_m) <- toupper(rownames(data_m))