我对R来说比较新。
使用以下代码:
source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)
# creating corpus on variable that I want to create plot on
myCorpus <- Corpus(VectorSource(final$MH2))
dtm2 <- DocumentTermMatrix(myCorpus)
# correlation of terms plot
freq.terms <- findFreqTerms(dtm2)[1:25] # choose top 25 terms
plot(dtm2, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1
然而,这个情节只需要单个单词,而不是短语。例如,“静脉曲张”和“静脉”不应该分开。它应该是“静脉曲张”。我能够创建一个实际上用短语解析的dtm,但是它无法绘制表达的dtm,只能绘制单个dtm。这是我在运行以下代码后的情节:
source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)
# create corpus with phrases kept together based off https://stackoverflow.com/questions/24038498/corpus-build-with-phrases
dat <- final[ , 3]
colnames(dat) <- c("text")
# create 2 variables to combine into 1 that will eventually read doc1...doc1000 etc
dat$docs <- "doc"
dat$num <- ""
dat$num <- 1:nrow(dat)
# combine both variables
dat$docs <- paste(dat$docs, dat$num, sep = "")
dat <- dat[ , -c(3)]
x <- sub_holder(", ", dat$text)
# create dtm here
MH_parsed <- apply_as_tm(t(wfm(x$unhold(gsub(" ", "~~", x$output)), dat$docs)),
weightTfIdf, to.qdap = FALSE)
# correlation of terms plot
freq.terms <- findFreqTerms(MH_parsed)[1:25] # choose top 25 terms
plot(MH_parsed, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1
如何使用图像中的短语制作相关图?
感谢。