大图上的简单路径查询

时间:2015-01-30 16:29:10

标签: data-mining networkx jung spark-graphx large-data

我对大图数据有疑问。假设我们有一个包含近1亿个边缘和大约500万个节点的大图,在这种情况下,你所知道的最佳图形挖掘平台可以给出所有简单的长度路径< = k(对于k = 3, 4,5)任意两个给定节点之间。主要关注的是获得这些路径的速度。另一件事是图形是定向的,但是我们希望程序在计算路径时忽略方向,但是一旦它发现那些路径,它仍然会返回实际定向的边缘。

例如:

a - > c< -d - > b是长度为3的节点'a'和'b'之间的有效路径。

提前致谢。

2 个答案:

答案 0 :(得分:1)

所以这是在networkx中实现的一种方式。它大致基于我给出的解决方案here。我假设a->ba<-b是您想要的两条不同路径。我将把它作为列表列表返回。每个子列表都是路径的(有序)边。

import networkx as nx
import itertools

def getPaths(G,source,target, maxLength, excludeSet=None):
    #print source, target, maxLength, excludeSet
    if excludeSet== None:
        excludeSet = set([source])
    else:
        excludeSet.add(source)# won't allow a path starting at source to go through source again.
    if maxLength == 0:
        excludeSet.remove(source)
        return []
    else:
        if G.has_edge(source,target):
            paths=[[(source,target)]]
        else:
            paths = []
        if G.has_edge(target,source):
            paths.append([(target,source)])
        #neighbors_iter is a big iterator that will give (neighbor,edge) for each successor of source and then for each predecessor of source.

        neighbors_iter = itertools.chain(((neighbor,(source,neighbor)) for neighbor in G.successors_iter(source) if neighbor != target),((neighbor,(neighbor,source)) for neighbor in G.predecessors_iter(source) if neighbor != target))

        #note that if a neighbor is both a predecessor and a successor, it shows up twice in this iteration.  

        paths.extend( [[edge] + path for (neighbor,edge) in neighbors_iter if neighbor not in excludeSet for path in getPaths(G,neighbor,target,maxLength-1,excludeSet)] )

        excludeSet.remove(source) #when we move back up the recursion, don't want to exclude this source any more

        return paths

G=nx.DiGraph()
G.add_edges_from([(1,2),(2,3),(1,3),(1,4),(3,4),(4,3)])

print getPaths(G,1,3,2)

>[[(1, 3)], [(1, 2), (2, 3)], [(1, 4), (4, 3)], [(1, 4), (3, 4)]]

我希望通过修改networkx中的dijkstra算法,你会得到一个更有效的算法(请注意,dijkstra算法有一个截止值,但默认情况下它只会返回最短路径,它会沿着边缘方向前进。)

这是整个路径的替代版本。扩展事物: paths.extend([[edge] +在neighbor_iter中的(邻居,边缘)的路径,如果邻居不在excludeSet中用于getPaths中的路径(G,neighbor,target,maxLength-1,excludeSet),如果len(路径)&gt; 0])

答案 1 :(得分:0)

我建议使用Gephi易于处理和学习。

如果您发现它,Neo4j会通过一些编码来满足您的要求。