R igraph Adjazenzmatrix加权图 - 图不加权

时间:2015-06-23 22:31:12

标签: r twitter graph igraph

我正在尝试绘制推文中使用的术语的加权图。基本上我做了一个术语文档矩阵;删除稀疏术语;建立其余单词的adjazenzmatrix,并想绘制它们。 我无法弄清楚问题出在哪里。试图完全像:http://www.rdatamining.com/examples/text-mining

这是我的代码:

tweet_corpus = Corpus(VectorSource(df$CONTENT))
tdm = TermDocumentMatrix(
     tweet_corpus,
     control = list(
       removePunctuation = TRUE,
       stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")),
       removeNumbers = TRUE, tolower = TRUE)
       )

m = as.matrix(tdm)
termDocMatrix <- m
termDocMatrix[5:10,1:20]
          Docs
Terms      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  aabutin  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aad      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aaf      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aali     0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aannacm  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aantukin 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

myTdm2 <- removeSparseTerms(tdm, sparse =0.98)
m2 <- as.matrix(myTdm2)
m2[5:10,1:20]
          Docs
Terms      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  filipino 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  give     0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  1  0  0
  god      0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0
  good     0 0 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0
  guy      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  0  1  0
  haiyan   0 0 0 0 0 0 0 0 0  0  0  0  1  0  0  0  0  0  0  0

myTdm2
<<TermDocumentMatrix (terms: 34, documents: 27395)>>
Non-/sparse entries: 39769/891661
Sparsity           : 96%
Maximal term length: 9
Weighting          : term frequency (tf)

termDocMatrix2 <- m2
termDocMatrix2[termDocMatrix2>=1] <- 1
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2)
termMatrix2[5:10,5:10]
          Terms
Terms      disaster give  god good guy   test
  disaster      623    6   53   11   4     19
  give            6  592   98   16   8      6
  god            53   98 2679  135  38     29
  good           11   16  135  816  21      5
  guy             4    8   38   21 637      5
  test           19    6   29    5   5    610
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected")
g2 <- simplify(g2)
V(g)$label <- V(g)$name
V(g2)$label <- V(g2)$name
V(g2)$degree <- degree(g2)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g2)
plot(g2, layout=layout1)
plot(g2, layout=layout.kamada.kawai)
V(g2)$label.cex <- 2.2 * V(g2)$degree / max(V(g2)$degree)+ .2
V(g2)$label.color <- rgb(0, 0, .2, .8)
V(g2)$frame.color <- NA
egam <- (log(E(g2)$weight)+.4) / max(log(E(g2)$weight)+.4)
E(g2)$color <- rgb(.5, .5, 0, egam)
E(g2)$width <- egam
plot(g2, layout=layout1)

这看起来像: enter image description here

但我希望有这样的事情: enter image description here

显然称重不起作用 - 但为什么?!

提前谢谢你们!

1 个答案:

答案 0 :(得分:0)

即使您的图表已加权,布局算法也不会使用权重,除非您明确告诉它这样做。试试这个:

layout1 <- layout.fruchterman.reingold(g2, weights=E(g2)$weight)

但是,如果你的权重在数量上有很大的变化,通常最好使用权重的对数(加上一些常数使所有权重严格为正)作为布局算法的输入。