使用R中的短语创建文本关联图

时间:2017-08-30 18:51:43

标签: r parsing plot correlation text-mining

我对R来说比较新。

我能够创建一个如下所示的相关图: enter image description here

使用以下代码:

source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)


# creating corpus on variable that I want to create plot on
myCorpus <- Corpus(VectorSource(final$MH2))  
dtm2 <- DocumentTermMatrix(myCorpus)

# correlation of terms plot
freq.terms <- findFreqTerms(dtm2)[1:25] # choose top 25 terms
plot(dtm2, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1

然而,这个情节只需要单个单词,而不是短语。例如,“静脉曲张”和“静脉”不应该分开。它应该是“静脉曲张”。我能够创建一个实际上用短语解析的dtm,但是它无法绘制表达的dtm,只能绘制单个dtm。这是我在运行以下代码后的情节:

enter image description here

source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)


# create corpus with phrases kept together based off https://stackoverflow.com/questions/24038498/corpus-build-with-phrases
dat <- final[ , 3]
colnames(dat) <- c("text")

# create 2 variables to combine into 1 that will eventually read doc1...doc1000 etc
dat$docs <- "doc"
dat$num <- ""
dat$num <- 1:nrow(dat)

# combine both variables
dat$docs <- paste(dat$docs, dat$num, sep = "")
dat <- dat[ , -c(3)]

x <- sub_holder(", ", dat$text)

# create dtm here
MH_parsed <- apply_as_tm(t(wfm(x$unhold(gsub(" ", "~~", x$output)), dat$docs)), 
                         weightTfIdf, to.qdap = FALSE)


# correlation of terms plot
freq.terms <- findFreqTerms(MH_parsed)[1:25] # choose top 25 terms
plot(MH_parsed, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1

如何使用图像中的短语制作相关图?

感谢。

0 个答案:

没有答案