Question

给出这样的树形图：

library(igraph)
g <- sample_pa(1000, power=1, directed=FALSE)
nodes <- V(g)[-1] # exclude root node since it has no parent.

获取每个节点的父级的最快方法是什么？

我目前使用它：

parents <- unlist(adjacent_vertices(g, nodes, mode = c("out")))

但它实际上是我的代码瓶颈之一，因为我需要为数千个图形（每个大约50个顶点）执行此操作。

Answer 1

首先，让我们在较小的图表上尝试这一点，以便我们可以看到正在发生的事情：

library(igraph)
set.seed(144)
g <- sample_pa(20, power=1, directed=FALSE)
plot(g)

在你的图表中，每个节点只有一个父节点，所以我希望一个长度为n-1的向量用于具有n个节点的图形（在这种情况下为19，在您提供的示例中为999）。您可以从边缘列表中有效地获得，选择第一列：

get.edgelist(g)[,1]
# [1] 1 1 2 3 3 2 4 1 6 1 9 6 2 6 2 1 1 8 7

从视觉上我们可以确认节点2的父节点是节点1，节点3的父节点是节点1，节点4的节点是节点2，节点5的节点是节点3，依此类推。

这比使用adjacent_vertices大图的方法更有效。例如，在大小为1,000的图表上，它大约快1,700倍：

set.seed(144)
g <- sample_pa(1000, power=1, directed=FALSE)
nodes <- V(g)[-1] # exclude root node since it has no parent.
library(microbenchmark)
microbenchmark(get.edgelist(g)[,1], unlist(adjacent_vertices(g, nodes, mode = c("out"))))
# Unit: microseconds
#                                                  expr        min         lq        mean     median         uq        max neval
#                                  get.edgelist(g)[, 1]     84.558    110.891    262.4235    125.497    169.947   9673.282   100
#  unlist(adjacent_vertices(g, nodes, mode = c("out"))) 303523.390 350459.141 455860.3464 444960.802 528314.593 754882.895   100

此外，您的示例在此示例中返回长度为1,965的向量，即使图表有999条边。这是因为大多数边缘由代码返回两次，每个端点返回一次。

如果您确实希望返回的所有1,965个值与您在问题中提供的代码完全相同，那么您仍然可以使用get.edgelist大大加快操作速度（750次）：

match.op.output <- function(g) {
  el <- get.edgelist(g)
  el <- rbind(el, el[,2:1])
  el <- el[order(el[,1], el[,2]),]
  el[el[,1] != 1,2]
}
all.equal(match.op.output(g), unlist(adjacent_vertices(g, nodes, mode = c("out"))))
# [1] TRUE
microbenchmark(match.op.output(g), unlist(adjacent_vertices(g, nodes, mode = c("out"))))
# Unit: microseconds
#                                                  expr        min          lq       mean    median          uq        max neval
#                                    match.op.output(g)    541.416    585.5115    692.889    652.18    744.0785   1437.427   100
#  unlist(adjacent_vertices(g, nodes, mode = c("out"))) 382952.446 429673.4950 507641.095 486633.23 554715.5570 749883.994   100

获取父母传染媒介的最快的方式在树图表

1 个答案: