什么是计算图形中三角形数量的有效算法?

时间:2011-09-13 03:46:21

标签: algorithm graph-theory

什么是计算无向图中三角形数量的有效算法)(图中是一组顶点和边)?我一直在搜索Google,每天连续三天阅读我的教科书架几个小时。

这是我需要这样一个算法的家庭作业,但是开发它并不算作任务的任何内容。我们可以从外部资源中找到这样一个算法,但我已经走到了尽头。

为了澄清,图中的三角形是长度为3的循环。诀窍是它需要处理最多10,000个节点的顶​​点集。

我目前正在使用C#,但更关心解决此问题的一般方法,而不是复制和粘贴代码。

在最高级别,我迄今为止的尝试包括:

  • 追踪长度为3的所有独特周期的广度优先搜索。这对我来说似乎是一个好主意,但我无法实现它的功能
  • 在图中的所有节点上循环,以查看三个顶点是否共享边。这对于较大的数据集而言运行时间太慢。为O(n ^ 3)。

算法本身是计算聚类系数的一部分。

4 个答案:

答案 0 :(得分:4)

您需要深度优先搜索。算法将是:

1)对于当前节点,询问所有未访问的相邻节点

2)对于每个节点运行深度2,检查深度为2的节点是否是从第一步开始的当前节点

3)将当前节点标记为已访问

4)on将每个未访问的相邻节点作为当前节点(1乘1)并运行相同的算法

答案 1 :(得分:2)

三角计数确实很困难,而且计算成本很高。也许这是了解原因的一个很好的起点:Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

适当的循环应检查每个n个节点对应它们的每个邻居(n *(n-1))并继续循环以查看n的邻居的邻居是否为n:(n *(n-1)) (n-1)(n-1),对于10000n几乎是10 ^ 16。有一百万个节点,这些循环变得愚蠢,但对于你的10000,你应该没有任何问题,如果你想暴力强制它:)

你提到你用C#编码,而图形(可用于C)有一个很好的算法来计算Gabor Csardi写的三角形。它在我的1000个节点的随机图表中计算了130万个三角形,在一台五年前的笔记本电脑上在1.3秒内计算了100万个边缘:) Gabor Csardi将会问:)

就不同的编程方法而言,您可能应该查看存储网络的数据。如果存储在邻接矩阵中,则循环的数量是固定的,但是在三个边缘的网络的边缘列表中,循环的数量是三的倍数,与节点的数量无关。您可以向边缘列表询问节点的邻居,而无需测试i-> j的每个组合。

我在R中编写了一个教学脚本来说明方法,并以非常基本的方式测量不同算法的速度。这里使用R固有很多速度问题(边缘列表版本被太多边缘完全淹没),但我认为代码示例应该有一些关于如何考虑粗暴速度的想法 - 强迫三角计数。这是在R,并不是非常整洁,但评论很好。我希望你能打破语言障碍。

一切顺利。

# Counting triangles in a random graph using igraph and two different
# and almost equally stupid approaches looping through the 1) adjacency
# matrix and 2) the edge-list in R.

# Use igraph and these configs
library(igraph)
V <- 100
E <-  1700

# This is the random graph that we will use
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE)

# igraph has such a fast algorythm. Long live Gabor Csardi!
benchmark <- proc.time()["elapsed"]
       triangle.count <- sum(count_triangles(g)/3)
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark

# For not to big networks we can plot them to get a basic feel
if(length(V(g)) < 100){
       V(g)$size <- 5
       plot(g)
}

# The adjacency matrix approach will have to deal with a crazy
# ammount of looping through pairs of matrix combinations to
# find links:

# We'll loop through each node to check it's participation in triangles
# notice that a triangle ijk will be participated in by nodes
# i, j, and k, and that each of those nodes have two triangular counts.
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted
# but shall be returned as 1 triangle. We will therefore devide our
# search-result by 6 later on.

# Store our progess in this matrix to look at how we did later on
progress <- matrix(0, nrow=length(V(g)), ncol=8)

# Loop through all nodes to find triangles in an adjacency matrix
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in 1:length(V(g))){
       # Node i has connections to these nodes:
       i.neighbors <- as.vector( neighborhood(g, 1, nodes=i)[[1]] )
       i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood

       # for each i, tri is the number of triangles that i is involved in
       # using any j or any k. For a triangle {1,2,3}, tri will be 2 for
       # i==1, since i is part of both triangles {1,2,3} and {1,3,2}:
       tri <- 0

       for(j in i.neighbors)
       {
              # Node j has connections to these nodes:
              j.neighbors <- as.vector( neighborhood(g, 1, nodes=j)[[1]] )
              j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood

              # Were any of j's neighbors also a neighbor of i?
              k <- intersect(i.neighbors, j.neighbors)

              tri <- tri + length(k)
       }

       # Save our findings to the progress matrix
       progress[i,1] <- tri
       progress[i,7] <- proc.time()["elapsed"] - benchmark
}
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x]))
progress[,3] <- round(progress[,2] / 6, digits=2)

# The edge-list approach uses a list of all edges in the network to loop through instead
# Here, I suppose, a lot of the extra speed could arise from R being better at looping
# with lapply() and at finding data in a data.frame that the brute-force loop above is.
el <- as.data.frame(as.matrix(get.edgelist(g, )))

# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In
# the igraph object, they are only given as low i to high j by get.edgelist()
  el.rev <- data.frame(el[,2], el[,1])
  names(el) <- names(el.rev) <- c("i","j")
  el <- rbind(el, el.rev)

# these nodes are connected (we'd only need to bother abouth non isolates)
nodes <- sort(unique(c(el$i, el$j)))
tri <- 0

# Loop through connected nodes to find triangles in edge-list
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in nodes){
       i.neighbors <- el[el$i==i,]$j
       # i's neighbors are the $j:s of the edgelist where $i:s are i. 

       k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j)))
       # lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s
       # If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then
       # the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If
       # {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3}

       # k.list might be NULL
       tri.for.this.i <- (as.numeric(length(k.list)) / 2)
       # Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj}
       # We will later have to devide by 3 more, since each triangle will be counted for
       # each node i that we loop through

       # Save the counting to the progress
       tri <- tri.for.this.i + tri
       progress[i,4] <- as.numeric(tri.for.this.i)
       mm <- c(mm, i)
       progress[i,8] <- proc.time()["elapsed"] - benchmark
}
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x]))
progress[,6] <- round(progress[,5] / 3, digits=2)

# Fix the results into a usable format
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"),
                      c(triangle.count, max(progress[,3]), max(progress[,6])),
                      c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8]))))
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop")
names(results) <- c("Routine", "Triangle count", "Execution-time")

# Now we have run two approaches of more or less the same thing.
# Not only should the igraph triangle.count, and the two loops
# be identical, but the progress of the two methods should too.
progress[,3] == progress[,6]
plot(progress[,6], type="l", col="blue")
lines(progress[,7], col="green")

# Look at the result:
View(results)

答案 2 :(得分:1)

取决于图表的表示方式。

如果你有一个邻接矩阵A,三角形的数量应该是tr(A ^ 3)/ 6,换句话说,是对角线元素之和的1/6(​​除法处理方向和旋转)

如果您有邻接列表,只需从每个节点开始并执行深度3搜索。计算您到达该节点的频率 - &gt;再次除以6。

答案 3 :(得分:0)

如果您不关心三角形的确切数量,那么有一种非常简单的流式算法可以提供无偏估计。有关说明,请参阅示例here