我正在尝试绘制推文中使用的术语的加权图。基本上我做了一个术语文档矩阵;删除稀疏术语;建立其余单词的adjazenzmatrix,并想绘制它们。 我无法弄清楚问题出在哪里。试图完全像:http://www.rdatamining.com/examples/text-mining
这是我的代码:
tweet_corpus = Corpus(VectorSource(df$CONTENT))
tdm = TermDocumentMatrix(
tweet_corpus,
control = list(
removePunctuation = TRUE,
stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")),
removeNumbers = TRUE, tolower = TRUE)
)
m = as.matrix(tdm)
termDocMatrix <- m
termDocMatrix[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
aabutin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aad 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaf 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aannacm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aantukin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
myTdm2 <- removeSparseTerms(tdm, sparse =0.98)
m2 <- as.matrix(myTdm2)
m2[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
filipino 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
give 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
god 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
good 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
guy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
haiyan 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
myTdm2
<<TermDocumentMatrix (terms: 34, documents: 27395)>>
Non-/sparse entries: 39769/891661
Sparsity : 96%
Maximal term length: 9
Weighting : term frequency (tf)
termDocMatrix2 <- m2
termDocMatrix2[termDocMatrix2>=1] <- 1
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2)
termMatrix2[5:10,5:10]
Terms
Terms disaster give god good guy test
disaster 623 6 53 11 4 19
give 6 592 98 16 8 6
god 53 98 2679 135 38 29
good 11 16 135 816 21 5
guy 4 8 38 21 637 5
test 19 6 29 5 5 610
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected")
g2 <- simplify(g2)
V(g)$label <- V(g)$name
V(g2)$label <- V(g2)$name
V(g2)$degree <- degree(g2)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g2)
plot(g2, layout=layout1)
plot(g2, layout=layout.kamada.kawai)
V(g2)$label.cex <- 2.2 * V(g2)$degree / max(V(g2)$degree)+ .2
V(g2)$label.color <- rgb(0, 0, .2, .8)
V(g2)$frame.color <- NA
egam <- (log(E(g2)$weight)+.4) / max(log(E(g2)$weight)+.4)
E(g2)$color <- rgb(.5, .5, 0, egam)
E(g2)$width <- egam
plot(g2, layout=layout1)
这看起来像:
但我希望有这样的事情:
显然称重不起作用 - 但为什么?!提前谢谢你们!
答案 0 :(得分:0)
即使您的图表已加权,布局算法也不会使用权重,除非您明确告诉它这样做。试试这个:
layout1 <- layout.fruchterman.reingold(g2, weights=E(g2)$weight)
但是,如果你的权重在数量上有很大的变化,通常最好使用权重的对数(加上一些常数使所有权重严格为正)作为布局算法的输入。