Question

我有一个9个变量的380个观测值的数据框。数据代表进行类似项目的人员之间的合作。在第一列中是主节点，其他列代表在项目上与他/他合作的人员，每一列代表一个人。因此，如果第1行第1列中的研究人员与5个人合作，则他们的姓名将在5列中；如果第2行第1列中的研究人员与3个人合作，则其姓名将在其他三列中。显然会有很多空白的列，因为不是所有的研究人员都会同等数量的人合作。有了这些数据，如何将其绘制到网络图中？

数据帧示例：

data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

我尝试使用graph.data.frame，但这仅提供了前两列之间的连接。

Answer 1

我们可以尝试使用ggraph软件包，但是我们必须整理好数据。

# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

# here you load some nice package
library(tidyr)      # to tidy the data
library(ggraph)     # to plot nice network data with the semantic of ggplot
library(tidygraph)  # to work with networks
library(ggrepel)    # to not have overlapping labels

首先，您应该准备数据。由于您有父亲行author_1和儿子，因此您应该设法对author_1和author_n的每种组合进行此操作，因为您应该只有一列。如果您没有分层数据集，则显然也可以使用。您应该为每行拥有双打父子的所有组合，然后rbind()合并所有组合（比较容易解释）。

edges <-rbind(
expand(data, nesting(author_1,author_2))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4))  %>%  `colnames<-`(c("a", "b"))   # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
   a        b       
   <fct>    <fct>   
 1 Joan     Terrence
 2 John     Joan    
 3 Kerry    Rick    
 4 Michelle N/A     
 5 Paul     Collin  
 6 Joan     Joan    
 7 John     Terrence
 8 Kerry    Michelle
 9 Michelle Michelle
10 Paul     Paul    
11 Joan     N/A     
12 John     Michelle
13 Kerry    Collin  
14 Michelle N/A     
15 Paul     Phillips

请记住，如果要绘制N / A，则将其保留不变，而在另一方面，请添加此%>% filter(b != 'N/A')。

现在，我们管理数据以将其放在图中：

# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
  group_by(researcher) %>%
  summarise(n = sum(n))

# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher) 
edges1$b <- match(edges1$b, nodes$researcher)

# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>% 
  activate(edges) %>% 
  arrange(desc(weight)
  ) 

# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +     
geom_node_point(aes(size=n)) +                          # size of the node the frequency
geom_edge_link(aes(width = weight),                     # here you set the edges
                                                        # thickness as frequency
               arrow = arrow(length = unit(4, 'mm')),   # arrows, if you want
               end_cap = circle(3, 'mm'), alpha = 0.8) + 
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))

这应该与data和：

> edges1
# A tibble: 14 x 3
# Groups:   a [?]
       a     b weight
   <int> <int>  <int>
 1     1     1      1
 2     1     7      1
 3     1     9      1
 4     2     1      1
 5     2     9      1
 6     2     4      1
 7     3     6      1
 8     3     8      1
 9     3     4      1
10     4     7      2
11     4     4      1
12     5     6      1
13     5     5      1
14     5    10      1
> nodes
# A tibble: 10 x 2
   researcher     n
   <fct>      <dbl>
 1 Joan           5
 2 John           3
 3 Kerry          3
 4 Michelle       6
 5 Paul           4
 6 Collin         2
 7 N/A            3
 8 Rick           1
 9 Terrence       2
10 Phillips       1

从数据框中的数据绘制社交网络图

1 个答案: