我试图绘制一个单词的最高相关性。例如,我想绘制单词" whale的最高十个相关性。"有人可以用这个命令帮助我吗?如果有帮助,我安装了RGraphViz。
s.dir1<-"/PATHTOTEXT/MobyDickTxt"
s.cor1<-Corpus(DirSource(s.dir1), readerControl=list(reader=readPlain))
s.cor1<-tm_map(s.cor1, removePunctuation)
s.cor1<-tm_map(s.cor1, stripWhitespace)
s.cor1<-tm_map(s.cor1, tolower)
s.cor1<-tm_map(s.cor1, removeNumbers)
s.cor1<-tm_map(s.cor1, removeWords, stopwords("english"))
tdm1 <- TermDocumentMatrix(s.cor1)
m1 <- as.matrix(tdm)
v1 <- sort(rowSums(m), decreasing=TRUE)
d1 <- data.frame(word = names(v),freq=v)
答案 0 :(得分:5)
这是一种计算与语料库中给定单词相关的顶部单词的方法,并绘制这些单词和相关性。
获取示例数据......
require(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
计算相关性并存储在数据框中......
toi <- "oil" # term of interest
corlimit <- 0.7 # lower correlation bound limit.
oil_0.7 <- data.frame(corr = findAssocs(tdm, toi, corlimit)[[1]],
terms = names(findAssocs(tdm, toi, corlimit)[[1]]))
创建一个因子以允许ggplot对数据帧进行排序...
oil_0.7$terms <- factor(oil_0.7$terms ,levels = oil_0.7$terms)
画出情节......
require(ggplot2)
ggplot(oil_0.7, aes( y = terms ) ) +
geom_point(aes(x = corr), data = oil_0.7) +
xlab(paste0("Correlation with the term ", "\"", toi, "\""))