R

时间:2015-07-01 19:40:04

标签: r text-mining

我正在尝试从某些交易活动中生成wordcloud,以显示人们花费最多的钱。交易活动如下所示:

Description       Amount
Albertson         20
Albertson         30
Albertson         35
CVS               10
CVS               40
Walmart           15
Walmart           44
...

我可以通过描述的频率轻松生成wordcloud。但是我如何获得按每个类别的总和(金额)排序的wordcloud?谢谢!

BTW这是我的代码

require(tm)
require(wordcloud)
require(RColorBrewer)

data_corpus <- Corpus(VectorSource(data))

data_corpus <- tm_map(data_corpus, content_transformer(tolower), mc.cores=1)
data_corpus <- tm_map(data_corpus, removePunctuation, mc.cores=1)
data_corpus <- tm_map(data_corpus, function(x)removeWords(x,stopwords()), mc.cores=1)
data_corpus <- tm_map(data_corpus, removeNumbers, mc.cores=1)

pal2 <- brewer.pal(8,"Dark2")
png("25-34.png", width=1280,height=800)
wordcloud(data_corpus, scale=c(6,.2),min.freq=50,max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()

1 个答案:

答案 0 :(得分:0)

我将迷你表加载到名为data的数据框中。然后运行以下代码:

require(wordcloud)
require(RColorBrewer)
library(dplyr)
# group by Description and sum the Amounts
data <- data %>% group_by(Description) %>% summarise(Amount = sum(Amount))

pal2 <- brewer.pal(8,"Dark2")
wordcloud(data$Description, freq = data$Amount, scale=c(6,.2),min.freq=50,max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)

不需要tm包。只需在单词部分和频率部分中的金额中指定您的描述。