中文文本挖掘

时间:2016-03-01 07:47:33

标签: r text-mining word-cloud

我使用中文单词段进行文本挖掘。我将数据类型更改为dataframe有逗号和双引号。所以wordcloud很奇怪。像这样: strange wordcloud

我的语法如下:     检查(d.corpus)

inspect(d.corpus) pic

d.corpus <- Corpus(DataframeSource(data.frame(as.character(d.corpus))))
tdm <- TermDocumentMatrix(d.corpus, control = list(wordLengths = c(2, Inf)))
m1 <- as.matrix(tdm)
v <- sort(rowSums(m1), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
wordcloud(d$word, d$freq, min.freq = 5, random.order = F, ordered.colors = F, 
    colors = rainbow(length(row.names(m1))))

如何修改数据?

我试图分割语法:

d.corpus <- Corpus(DataframeSource(data.frame(as.character(d.corpus)))).

为什么as.character(d.corpus)有3个?

test1 <- as.character(d.corpus)

1 个答案:

答案 0 :(得分:0)

我发现它用于循环编辑名称(v)数据。

for (i in 1:length(names(v)))
{
    names(v)[i] <- gsub('[\",]','',names(v)[i])
}

result