什么是计算无向图中三角形数量的有效算法)(图中是一组顶点和边)?我一直在搜索Google,每天连续三天阅读我的教科书架几个小时。
这是我需要这样一个算法的家庭作业,但是开发它并不算作任务的任何内容。我们可以从外部资源中找到这样一个算法,但我已经走到了尽头。
为了澄清,图中的三角形是长度为3的循环。诀窍是它需要处理最多10,000个节点的顶点集。
我目前正在使用C#,但更关心解决此问题的一般方法,而不是复制和粘贴代码。
在最高级别,我迄今为止的尝试包括:
算法本身是计算聚类系数的一部分。
答案 0 :(得分:4)
您需要深度优先搜索。算法将是:
1)对于当前节点,询问所有未访问的相邻节点
2)对于每个节点运行深度2,检查深度为2的节点是否是从第一步开始的当前节点
3)将当前节点标记为已访问
4)on将每个未访问的相邻节点作为当前节点(1乘1)并运行相同的算法
答案 1 :(得分:2)
三角计数确实很困难,而且计算成本很高。也许这是了解原因的一个很好的起点:Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning。
适当的循环应检查每个n个节点对应它们的每个邻居(n *(n-1))并继续循环以查看n的邻居的邻居是否为n:(n *(n-1)) (n-1)(n-1),对于10000n几乎是10 ^ 16。有一百万个节点,这些循环变得愚蠢,但对于你的10000,你应该没有任何问题,如果你想暴力强制它:)
你提到你用C#编码,而图形(可用于C)有一个很好的算法来计算Gabor Csardi写的三角形。它在我的1000个节点的随机图表中计算了130万个三角形,在一台五年前的笔记本电脑上在1.3秒内计算了100万个边缘:) Gabor Csardi将会问:)
就不同的编程方法而言,您可能应该查看存储网络的数据。如果存储在邻接矩阵中,则循环的数量是固定的,但是在三个边缘的网络的边缘列表中,循环的数量是三的倍数,与节点的数量无关。您可以向边缘列表询问节点的邻居,而无需测试i-> j的每个组合。
我在R中编写了一个教学脚本来说明方法,并以非常基本的方式测量不同算法的速度。这里使用R固有很多速度问题(边缘列表版本被太多边缘完全淹没),但我认为代码示例应该有一些关于如何考虑粗暴速度的想法 - 强迫三角计数。这是在R,并不是非常整洁,但评论很好。我希望你能打破语言障碍。
一切顺利。
# Counting triangles in a random graph using igraph and two different
# and almost equally stupid approaches looping through the 1) adjacency
# matrix and 2) the edge-list in R.
# Use igraph and these configs
library(igraph)
V <- 100
E <- 1700
# This is the random graph that we will use
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE)
# igraph has such a fast algorythm. Long live Gabor Csardi!
benchmark <- proc.time()["elapsed"]
triangle.count <- sum(count_triangles(g)/3)
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark
# For not to big networks we can plot them to get a basic feel
if(length(V(g)) < 100){
V(g)$size <- 5
plot(g)
}
# The adjacency matrix approach will have to deal with a crazy
# ammount of looping through pairs of matrix combinations to
# find links:
# We'll loop through each node to check it's participation in triangles
# notice that a triangle ijk will be participated in by nodes
# i, j, and k, and that each of those nodes have two triangular counts.
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted
# but shall be returned as 1 triangle. We will therefore devide our
# search-result by 6 later on.
# Store our progess in this matrix to look at how we did later on
progress <- matrix(0, nrow=length(V(g)), ncol=8)
# Loop through all nodes to find triangles in an adjacency matrix
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in 1:length(V(g))){
# Node i has connections to these nodes:
i.neighbors <- as.vector( neighborhood(g, 1, nodes=i)[[1]] )
i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood
# for each i, tri is the number of triangles that i is involved in
# using any j or any k. For a triangle {1,2,3}, tri will be 2 for
# i==1, since i is part of both triangles {1,2,3} and {1,3,2}:
tri <- 0
for(j in i.neighbors)
{
# Node j has connections to these nodes:
j.neighbors <- as.vector( neighborhood(g, 1, nodes=j)[[1]] )
j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood
# Were any of j's neighbors also a neighbor of i?
k <- intersect(i.neighbors, j.neighbors)
tri <- tri + length(k)
}
# Save our findings to the progress matrix
progress[i,1] <- tri
progress[i,7] <- proc.time()["elapsed"] - benchmark
}
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x]))
progress[,3] <- round(progress[,2] / 6, digits=2)
# The edge-list approach uses a list of all edges in the network to loop through instead
# Here, I suppose, a lot of the extra speed could arise from R being better at looping
# with lapply() and at finding data in a data.frame that the brute-force loop above is.
el <- as.data.frame(as.matrix(get.edgelist(g, )))
# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In
# the igraph object, they are only given as low i to high j by get.edgelist()
el.rev <- data.frame(el[,2], el[,1])
names(el) <- names(el.rev) <- c("i","j")
el <- rbind(el, el.rev)
# these nodes are connected (we'd only need to bother abouth non isolates)
nodes <- sort(unique(c(el$i, el$j)))
tri <- 0
# Loop through connected nodes to find triangles in edge-list
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in nodes){
i.neighbors <- el[el$i==i,]$j
# i's neighbors are the $j:s of the edgelist where $i:s are i.
k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j)))
# lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s
# If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then
# the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If
# {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3}
# k.list might be NULL
tri.for.this.i <- (as.numeric(length(k.list)) / 2)
# Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj}
# We will later have to devide by 3 more, since each triangle will be counted for
# each node i that we loop through
# Save the counting to the progress
tri <- tri.for.this.i + tri
progress[i,4] <- as.numeric(tri.for.this.i)
mm <- c(mm, i)
progress[i,8] <- proc.time()["elapsed"] - benchmark
}
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x]))
progress[,6] <- round(progress[,5] / 3, digits=2)
# Fix the results into a usable format
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"),
c(triangle.count, max(progress[,3]), max(progress[,6])),
c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8]))))
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop")
names(results) <- c("Routine", "Triangle count", "Execution-time")
# Now we have run two approaches of more or less the same thing.
# Not only should the igraph triangle.count, and the two loops
# be identical, but the progress of the two methods should too.
progress[,3] == progress[,6]
plot(progress[,6], type="l", col="blue")
lines(progress[,7], col="green")
# Look at the result:
View(results)
答案 2 :(得分:1)
取决于图表的表示方式。
如果你有一个邻接矩阵A,三角形的数量应该是tr(A ^ 3)/ 6,换句话说,是对角线元素之和的1/6(除法处理方向和旋转)
如果您有邻接列表,只需从每个节点开始并执行深度3搜索。计算您到达该节点的频率 - &gt;再次除以6。
答案 3 :(得分:0)
如果您不关心三角形的确切数量,那么有一种非常简单的流式算法可以提供无偏估计。有关说明,请参阅示例here。