我有一个9个变量的380个观测值的数据框。数据代表进行类似项目的人员之间的合作。在第一列中是主节点,其他列代表在项目上与他/他合作的人员,每一列代表一个人。因此,如果第1行第1列中的研究人员与5个人合作,则他们的姓名将在5列中;如果第2行第1列中的研究人员与3个人合作,则其姓名将在其他三列中。显然会有很多空白的列,因为不是所有的研究人员都会同等数量的人合作。有了这些数据,如何将其绘制到网络图中?
数据帧示例:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
我尝试使用graph.data.frame,但这仅提供了前两列之间的连接。
答案 0 :(得分:2)
我们可以尝试使用ggraph
软件包,但是我们必须整理好数据。
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
首先,您应该准备数据。由于您有父亲行author_1
和儿子,因此您应该设法对author_1
和author_n
的每种组合进行此操作,因为您应该只有一列。如果您没有分层数据集,则显然也可以使用。您应该为每行拥有双打父子的所有组合,然后rbind()
合并所有组合(比较容易解释)。
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
请记住,如果要绘制N / A,则将其保留不变,而在另一方面,请添加此%>% filter(b != 'N/A')
。
现在,我们管理数据以将其放在图中:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))
这应该与data
和:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1