从表中自动获取节点和链接以获得sankey图

时间:2018-05-30 08:38:19

标签: r sankey-diagram

为了绘制sankey图,需要节点和链接。要从数据帧中获取节点和链接,可以使用例如来自包plyr的计数函数,并使用它为每个节点计算邻居之间的链接,但是还有另一种优雅的方式吗?

示例目标,目标是获取节点和链接:

param1 | param2 | param3 |
a      | b      | d      |
w      | c      | d      |
a      | b      | d      |
z      | c      | e      |

#nodes:
nodes = data.frame("name" = 
c(
a, #node 0
w, #node 1
z, #node 2
b, #node 3
c, #node 4
d, #node 5
e  #node 6
))

#links
links = as.data.frame(matrix(c(
0, 3, 2, # from node 0,  to node 3, freq
1, 4, 1,
2, 4, 1,
3, 5, 2,
4, 5, 1,
4, 6, 1,
),
byrow = TRUE, ncol = 3))

1 个答案:

答案 0 :(得分:2)

使用 igraph 包:

library(dplyr)
library(igraph)

# example data
df1 <- read.table(text="
                  param1 param2 param3
                  a b d
                  w c d
                  a b d
                  z c e", header = TRUE, stringsAsFactors = FALSE)

# make graph
g <- graph_from_data_frame(
  rbind(
    setNames(df1[, 1:2], c("from", "to")),
    setNames(df1[, 2:3], c("from", "to"))))


nodes <- data.frame(id = as.numeric(V(g)),
                    name = V(g)$name)
nodes
#   id name
# 1  1    a
# 2  2    w
# 3  3    z
# 4  4    b
# 5  5    c
# 6  6    d
# 7  7    e

links <- as.data.frame(get.edges(g, E(g))) %>%
  group_by(V1, V2) %>%
  summarise(freq = n()) %>% 
  data.frame()

links
#   V1 V2 freq
# 1  1  4    2
# 2  2  5    1
# 3  3  5    1
# 4  4  6    2
# 5  5  6    1
# 6  5  7    1