从数据框中的数据绘制社交网络图

时间:2018-11-11 18:02:26

标签: r dataframe igraph social-networking

我有一个9个变量的380个观测值的数据框。数据代表进行类似项目的人员之间的合作。在第一列中是主节点,其他列代表在项目上与他/他合作的人员,每一列代表一个人。因此,如果第1行第1列中的研究人员与5个人合作,则他们的姓名将在5列中;如果第2行第1列中的研究人员与3个人合作,则其姓名将在其他三列中。显然会有很多空白的列,因为不是所有的研究人员都会同等数量的人合作。有了这些数据,如何将其绘制到网络图中?

数据帧示例:

data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

我尝试使用graph.data.frame,但这仅提供了前两列之间的连接。

1 个答案:

答案 0 :(得分:2)

我们可以尝试使用ggraph软件包,但是我们必须整理好数据。

# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

# here you load some nice package
library(tidyr)      # to tidy the data
library(ggraph)     # to plot nice network data with the semantic of ggplot
library(tidygraph)  # to work with networks
library(ggrepel)    # to not have overlapping labels

首先,您应该准备数据。由于您有父亲行author_1和儿子,因此您应该设法对author_1author_n的每种组合进行此操作,因为您应该只有一列。如果您没有分层数据集,则显然也可以使用。您应该为每行拥有双打父子的所有组合,然后rbind()合并所有组合(比较容易解释)。

edges <-rbind(
expand(data, nesting(author_1,author_2))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4))  %>%  `colnames<-`(c("a", "b"))   # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
   a        b       
   <fct>    <fct>   
 1 Joan     Terrence
 2 John     Joan    
 3 Kerry    Rick    
 4 Michelle N/A     
 5 Paul     Collin  
 6 Joan     Joan    
 7 John     Terrence
 8 Kerry    Michelle
 9 Michelle Michelle
10 Paul     Paul    
11 Joan     N/A     
12 John     Michelle
13 Kerry    Collin  
14 Michelle N/A     
15 Paul     Phillips

请记住,如果要绘制N / A,则将其保留不变,而在另一方面,请添加此%>% filter(b != 'N/A')

现在,我们管理数据以将其放在图中:

# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
  group_by(researcher) %>%
  summarise(n = sum(n))

# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher) 
edges1$b <- match(edges1$b, nodes$researcher)

# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>% 
  activate(edges) %>% 
  arrange(desc(weight)
  ) 

# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +     
geom_node_point(aes(size=n)) +                          # size of the node the frequency
geom_edge_link(aes(width = weight),                     # here you set the edges
                                                        # thickness as frequency
               arrow = arrow(length = unit(4, 'mm')),   # arrows, if you want
               end_cap = circle(3, 'mm'), alpha = 0.8) + 
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher)) 

enter image description here

这应该与data和:

> edges1
# A tibble: 14 x 3
# Groups:   a [?]
       a     b weight
   <int> <int>  <int>
 1     1     1      1
 2     1     7      1
 3     1     9      1
 4     2     1      1
 5     2     9      1
 6     2     4      1
 7     3     6      1
 8     3     8      1
 9     3     4      1
10     4     7      2
11     4     4      1
12     5     6      1
13     5     5      1
14     5    10      1
> nodes
# A tibble: 10 x 2
   researcher     n
   <fct>      <dbl>
 1 Joan           5
 2 John           3
 3 Kerry          3
 4 Michelle       6
 5 Paul           4
 6 Collin         2
 7 N/A            3
 8 Rick           1
 9 Terrence       2
10 Phillips       1