创建"最低限度连接"有向无环图

时间:2015-08-21 03:32:35

标签: python python-3.x networkx directed-acyclic-graphs

我在NetworkX中有一个有向无环的简单图。

现在,对于每个边缘,该边缘都有一个"源"和#34;目标"。如果存在来自"来源的路径"到目标" 除了这个边缘,我想删除这个边缘。

  1. NetworkX是否有内置功能来执行此操作?
  2. 我真的不想重新发明轮子。

    1. [可选]仅在对1的答案为" no"的情况下,那么实现此目的的最有效算法是什么(对于相当密集的图表)?
    2. 以下是需要清理的DAG示例:

      • 节点是:

        ['termsequence', 'maximumdegree', 'emptymultigraph', 'minimum', 'multiset', 'walk', 'nonemptymultigraph', 'euleriantrail', 'nonnullmultigraph', 'cycle', 'loop', 'abwalk', 'endvertices', 'simplegraph', 'vertex', 'multipletrails', 'edge', 'set', 'stroll', 'union', 'trailcondition', 'nullmultigraph', 'trivialmultigraph', 'sequence', 'multiplepaths', 'path', 'degreevertex', 'onedgesonvertices', 'nontrivialmultigraph', 'adjacentedges', 'adjacentvertices', 'simpleedge', 'maximum', 'multipleloops', 'length', 'circuit', 'class', 'euleriangraph', 'incident', 'minimumdegree', 'orderedpair', 'unique', 'closedwalk', 'multipleedges', 'pathcondition', 'multigraph', 'trail']
        
      • 边缘是:

        [('termsequence', 'endvertices'), ('emptymultigraph', 'nonemptymultigraph'), ('minimum', 'minimumdegree'), ('multiset', 'trailcondition'), ('multiset', 'pathcondition'), ('multiset', 'multigraph'), ('walk', 'length'), ('walk', 'closedwalk'), ('walk', 'abwalk'), ('walk', 'trail'), ('walk', 'endvertices'), ('euleriantrail', 'euleriangraph'), ('loop', 'simplegraph'), ('loop', 'degreevertex'), ('loop', 'simpleedge'), ('loop', 'multipleloops'), ('endvertices', 'abwalk'), ('vertex', 'adjacentvertices'), ('vertex', 'onedgesonvertices'), ('vertex', 'walk'), ('vertex', 'adjacentedges'), ('vertex', 'multipleedges'), ('vertex', 'edge'), ('vertex', 'multipleloops'), ('vertex', 'degreevertex'), ('vertex', 'incident'), ('edge', 'adjacentvertices'), ('edge', 'onedgesonvertices'), ('edge', 'multipleedges'), ('edge', 'simpleedge'), ('edge', 'adjacentedges'), ('edge', 'loop'), ('edge', 'trailcondition'), ('edge', 'pathcondition'), ('edge', 'walk'), ('edge', 'incident'), ('set', 'onedgesonvertices'), ('set', 'edge'), ('union', 'multiplepaths'), ('union', 'multipletrails'), ('trailcondition', 'trail'), ('nullmultigraph', 'nonnullmultigraph'), ('sequence', 'walk'), ('sequence', 'endvertices'), ('path', 'cycle'), ('path', 'multiplepaths'), ('degreevertex', 'maximumdegree'), ('degreevertex', 'minimumdegree'), ('onedgesonvertices', 'multigraph'), ('maximum', 'maximumdegree'), ('circuit', 'euleriangraph'), ('class', 'multiplepaths'), ('class', 'multipletrails'), ('incident', 'adjacentedges'), ('incident', 'degreevertex'), ('incident', 'onedgesonvertices'), ('orderedpair', 'multigraph'), ('closedwalk', 'circuit'), ('closedwalk', 'cycle'), ('closedwalk', 'stroll'), ('pathcondition', 'path'), ('multigraph', 'euleriangraph'), ('multigraph', 'nullmultigraph'), ('multigraph', 'trivialmultigraph'), ('multigraph', 'nontrivialmultigraph'), ('multigraph', 'emptymultigraph'), ('multigraph', 'euleriantrail'), ('multigraph', 'simplegraph'), ('trail', 'path'), ('trail', 'circuit'), ('trail', 'multipletrails')]
        

3 个答案:

答案 0 :(得分:4)

我之前的回答专门讨论了是否有一种好方法来测试单个边缘是否冗余的直接问题。

看起来你真的想要一种有效去除所有冗余边缘的方法。这意味着您想要一次性完成所有操作。这是一个不同的问题,但这是一个答案。我不相信networkx有内置的东西,但找到一个可行的算法并不困难。

这个想法是因为它是一个DAG,所以有些节点没有外边缘。从他们开始并处理它们。一旦处理完毕,他们父母的一部分就没有孩子没有被处理过。通过那些父母。重复。在每个阶段,未处理的节点集是DAG,我们正在处理"端节点"那个DAG。保证完成(如果原始网络是有限的)。

在实现中,每当我们处理节点时,我们首先检查是否有任何子节点也是间接后代。如果是,请删除边缘。如果没有,请保留。当处理所有子项时,我们通过将其所有后代添加到父项的间接后代集来更新其父项的信息。如果处理了父项的所有子项,我们现在将它添加到列表中以供下一次迭代。

import networkx as nx
from collections import defaultdict

def remove_redundant_edges(G):
    processed_child_count = defaultdict(int)  #when all of a nodes children are processed, we'll add it to nodes_to_process
    descendants = defaultdict(set)            #all descendants of a node (including children)
    out_degree = {node:G.out_degree(node) for node in G.nodes_iter()}
    nodes_to_process = [node for node in G.nodes_iter() if out_degree[node]==0] #initially it's all nodes without children
    while nodes_to_process:
        next_nodes = []
        for node in nodes_to_process:
            '''when we enter this loop, the descendants of a node are known, except for direct children.'''
            for child in G.neighbors(node):
                if child in descendants[node]:  #if the child is already an indirect descendant, delete the edge
                    G.remove_edge(node,child)
                else:                                    #otherwise add it to the descendants
                    descendants[node].add(child)
            for predecessor in G.predecessors(node):             #update all parents' indirect descendants
                descendants[predecessor].update(descendants[node])  
                processed_child_count[predecessor]+=1            #we have processed one more child of this parent
                if processed_child_count[predecessor] == out_degree[predecessor]:  #if all children processed, add to list for next iteration.
                    next_nodes.append(predecessor)
        nodes_to_process=next_nodes

测试它:

G=nx.DiGraph()
G.add_nodes_from(['termsequence', 'maximumdegree', 'emptymultigraph', 'minimum', 'multiset', 'walk', 'nonemptymultigraph', 'euleriantrail', 'nonnullmultigraph', 'cycle', 'loop', 'abwalk', 'endvertices', 'simplegraph', 'vertex', 'multipletrails', 'edge', 'set', 'stroll', 'union', 'trailcondition', 'nullmultigraph', 'trivialmultigraph', 'sequence', 'multiplepaths', 'path', 'degreevertex', 'onedgesonvertices', 'nontrivialmultigraph', 'adjacentedges', 'adjacentvertices', 'simpleedge', 'maximum', 'multipleloops', 'length', 'circuit', 'class', 'euleriangraph', 'incident', 'minimumdegree', 'orderedpair', 'unique', 'closedwalk', 'multipleedges', 'pathcondition', 'multigraph', 'trail'])
G.add_edges_from([('termsequence', 'endvertices'), ('emptymultigraph', 'nonemptymultigraph'), ('minimum', 'minimumdegree'), ('multiset', 'trailcondition'), ('multiset', 'pathcondition'), ('multiset', 'multigraph'), ('walk', 'length'), ('walk', 'closedwalk'), ('walk', 'abwalk'), ('walk', 'trail'), ('walk', 'endvertices'), ('euleriantrail', 'euleriangraph'), ('loop', 'simplegraph'), ('loop', 'degreevertex'), ('loop', 'simpleedge'), ('loop', 'multipleloops'), ('endvertices', 'abwalk'), ('vertex', 'adjacentvertices'), ('vertex', 'onedgesonvertices'), ('vertex', 'walk'), ('vertex', 'adjacentedges'), ('vertex', 'multipleedges'), ('vertex', 'edge'), ('vertex', 'multipleloops'), ('vertex', 'degreevertex'), ('vertex', 'incident'), ('edge', 'adjacentvertices'), ('edge', 'onedgesonvertices'), ('edge', 'multipleedges'), ('edge', 'simpleedge'), ('edge', 'adjacentedges'), ('edge', 'loop'), ('edge', 'trailcondition'), ('edge', 'pathcondition'), ('edge', 'walk'), ('edge', 'incident'), ('set', 'onedgesonvertices'), ('set', 'edge'), ('union', 'multiplepaths'), ('union', 'multipletrails'), ('trailcondition', 'trail'), ('nullmultigraph', 'nonnullmultigraph'), ('sequence', 'walk'), ('sequence', 'endvertices'), ('path', 'cycle'), ('path', 'multiplepaths'), ('degreevertex', 'maximumdegree'), ('degreevertex', 'minimumdegree'), ('onedgesonvertices', 'multigraph'), ('maximum', 'maximumdegree'), ('circuit', 'euleriangraph'), ('class', 'multiplepaths'), ('class', 'multipletrails'), ('incident', 'adjacentedges'), ('incident', 'degreevertex'), ('incident', 'onedgesonvertices'), ('orderedpair', 'multigraph'), ('closedwalk', 'circuit'), ('closedwalk', 'cycle'), ('closedwalk', 'stroll'), ('pathcondition', 'path'), ('multigraph', 'euleriangraph'), ('multigraph', 'nullmultigraph'), ('multigraph', 'trivialmultigraph'), ('multigraph', 'nontrivialmultigraph'), ('multigraph', 'emptymultigraph'), ('multigraph', 'euleriantrail'), ('multigraph', 'simplegraph'), ('trail', 'path'), ('trail', 'circuit'), ('trail', 'multipletrails')])

print G.size()
>71
print G.order()
>47
descendants = {}  #for testing below
for node in G.nodes():
    descendants[node] = nx.descendants(G,node)

remove_redundant_edges(G)  #this removes the edges

print G.size()  #lots of edges gone
>56
print G.order() #no nodes changed.
>47
newdescendants = {}  #for comparison with above
for node in G.nodes():
    newdescendants[node] = nx.descendants(G,node)

for node in G.nodes():  
    if descendants[node] != newdescendants[node]:
        print 'descendants changed!!'   #all nodes have the same descendants
    for child in G.neighbors(node):  
        if len(list(nx.all_simple_paths(G,node, child)))>1:
            print 'bad edge'  #no alternate path exists from a node to its child.

这将是高效的:它必须在开始时处理每个节点以查看它是否是"结束"节点。然后它处理到达那些边缘的每个边缘并检查是否已经处理了该父节点的所有子节点。然后看着那些父母和重复。

因此它将处理每个边一次(包括瞥一眼父节点),每个顶点将在开头处理一次,然后处理一次。

答案 1 :(得分:3)

这是一个简单的通用算法。该算法可以向前或向后运行。这和Joel的回答基本上是双重的 - 他的向后跑,这是向前发展的:

def remove_redundant_edges(G):
    """
    Remove redundant edges from a DAG using networkx (nx).
    An edge is redundant if there is an alternate path
    from its start node to its destination node.

    This algorithm could work front to back, or back to front.
    We choose to work front to back.

    The main persistent variable (in addition to the graph
    itself) is indirect_pred_dict, which is a dictionary with
    one entry per graph node.  Each entry is a set of indirect
    predecessors of this node.

    The algorithmic complexity of the code on a worst-case
    fully-connected graph is O(V**3), where V is the number
    of nodes.
    """

    indirect_pred_dict = collections.defaultdict(set)
    for node in nx.topological_sort(G):
        indirect_pred = indirect_pred_dict[node]
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred

复杂性分析和大O优化

对于最小连通图,其中每个节点仅连接到单个边,复杂度为O(V+E)。但是,即使是一个简单的线性图(每个节点都有一个输入边和一个输出边),复杂度为O(V*E),而最大连通图(这是最坏的情况) ,其中每个节点连接到图上的每个下游节点),复杂度为O(V**3)。对于这种情况,ops的数量遵循序列A000292,即n * (n+1) * (n+2) / 6,其中n是节点数(V)减去3。

根据图表的形状,您可以执行其他优化。这是一个包含一些不同优化器的版本,可以显着降低某些类型图形的复杂性和运行时间:

def remove_redundant_edges(G, optimize_dense=True, optimize_chains=True,
                              optimize_tree=False,  optimize_funnel=False):
    """
    Remove redundant edges from a DAG using networkx (nx).
    An edge is redundant if there is an alternate path
    from its start node to its destination node.

    This algorithm could work equally well front to back,
    or back to front. We choose to work front to back.

    The main persistent variable (in addition to the graph
    itself) is indirect_pred_dict, which is a dictionary with
    one entry per graph node.  Each entry is a set of indirect
    predecessors of this node.

    The main processing algorithm uses this dictionary to
    iteratively calculate indirect predecessors and direct
    predecessors for every node, and prune the direct
    predecessors edges if they are also accessible indirectly.
    The algorithmic complexity is O(V**3), where V is the
    number of nodes in the graph.

    There are also several graph shape-specific optimizations
    provided.  These optimizations could actually increase
    run-times, especially for small graphs that are not amenable
    to the optimizations, so if your execution time is slow,
    you should test different optimization combinations.

    But for the right graph shape, these optimizations can
    provide dramatic improvements.  For the fully connected
    graph (which is worst-case), optimize_dense reduces the
    algorithmic complexity from O(V**3) to O(V**2).

    For a completely linear graph, any of the optimize_tree,
    optimize_chains, or optimize_funnel options would decrease
    complexity from O(V**2) to O(V).

    If the optimize_dense option is set to True, then an
    optimization phase is before the main algorithm.  This
    optimization phase works by looking for matches between
    each node's successors and that same node's successor's
    successors (by only looking one level ahead at a time).

    If the optimize_tree option is set true, then a phase is
    run that will optimize trees by working right-to-left and
    recursively removing leaf nodes with a single predecessor.
    This will also optimize linear graphs, which are degenerate
    trees.

    If the optimize_funnel option is set true, then funnels
    (inverted trees) will be optimized.

    If the optimize_chains option is set true, then chains
    (linear sections) will be optimized by sharing the
    indirect_pred_dict sets.  This works because Python
    checks to see if two sets are the same instance before
    combining them.

    For a completely linear graph, optimize_funnel or optimize_tree
    execute more quickly than optimize_chains.  Nonetheless,
    optimize_chains option is enabled by default, because
    it is a balanced algorithm that works in more cases than
    the other two.
    """

    ordered = nx.topological_sort(G)

    if optimize_dense:
        succs= dict((node, set(G.successors(node))) for node in ordered)
        for node in ordered:
            my_succs = succs.pop(node)
            kill = set()
            while my_succs:
                succ = my_succs.pop()
                if succ not in kill:
                    check = succs[succ]
                    kill.update(x for x in my_succs if x in check)
            for succ in kill:
                G.remove_edge(node, succ)

    indirect_pred_dict = dict((node, set()) for node in ordered)

    if optimize_tree:
        remaining_nodes = set(ordered)
        for node in reversed(ordered):
            if G.in_degree(node) == 1:
                if not (set(G.successors(node)) & remaining_nodes):
                    remaining_nodes.remove(node)
        ordered = [node for node in ordered if node in remaining_nodes]

    if optimize_funnel:
        remaining_nodes = set(ordered)
        for node in ordered:
            if G.out_degree(node) == 1:
                if not (set(G.predecessors(node)) & remaining_nodes):
                    remaining_nodes.remove(node)
        ordered = [node for node in ordered if node in remaining_nodes]

    if optimize_chains:
        # This relies on Python optimizing the set |= operation
        # by seeing if the objects are identical.
        for node in ordered:
            succs = G.successors(node)
            if len(succs) == 1 and len(G.predecessors(succs[0])) == 1:
                indirect_pred_dict[succs[0]] = indirect_pred_dict[node]

    for node in ordered:
        indirect_pred = indirect_pred_dict.pop(node)
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred

我还没有分析是否有可能构建一个密集但非最大连接的图,在启用optimize_dense选项的情况下复杂度大于O(V**2),但我没有理由, a先天,相信这是不可能的。优化最适用于最大连接图,并且不会做任何事情,例如,在每个节点与其grandchilden而不是其子节点共享后继节点的情况下,我没有分析这种情况的运行时。 / p>

示例测试台

我已经删除了基本算法的代码,并添加了记录最坏情况路径所需操作数的工具,以及生成最大连接图的示例测试生成器。

import collections
import networkx as nx

def makegraph(numnodes):
    """
    Make a fully-connected graph given a number of nodes
    """
    edges = []
    for i in range(numnodes):
        for j in range(i+1, numnodes):
            edges.append((i, j))
    return nx.DiGraph(edges)

def remove_redundant_edges(G):
    ops = 0
    indirect_pred_dict = collections.defaultdict(set)
    for node in nx.topological_sort(G):
        indirect_pred = indirect_pred_dict[node]
        direct_pred = G.predecessors(node)
        for pred in direct_pred:
            if pred in indirect_pred:
                G.remove_edge(pred, node)
        indirect_pred.update(direct_pred)
        for succ in G.successors(node):
            indirect_pred_dict[succ] |= indirect_pred
            ops += len(indirect_pred)
    return ops

def test_1(f, numnodes):
    G = makegraph(numnodes)
    e1 = nx.number_of_edges(G)
    ops = f(G)
    e2 = nx.number_of_edges(G)
    return ops, e1, e2

for numnodes in range(30):
    a = test_1(remove_redundant_edges, numnodes)
    print numnodes, a[0]

答案 2 :(得分:0)

是。

您想要使用{% block jsPreBodyAdditions %} {{ alreadyCollected() }} page2 script tag or inline js to APPEND {% endblock %} {% block metaTags %} {{ alreadyCollected() }} page2 additional semantic meta tags to APPEND {% endblock %} [documentation](它提供两者之间的所有简单路径的生成器)。一找到第二个就退出,这样它就不会计算所有这些。

定义完成后,您需要查看每条边,如果从源到目标的路径不止一条,则删除边。

all_simple_paths

注意,您也可以删除边缘,然后运行def multiple_paths(G,source,target): '''returns True if there are multiple_paths, False otherwise''' path_generator = nx.all_simple_paths(G, source=source, target=target) counter = 0 for path in path_generator: #test to see if there are multiple paths counter += 1 if counter >1: break #instead of breaking, could have return True if counter >1: #counter == 2 return True else: #counter == 0 or 1 return False import networkx as nx G=nx.DiGraph() G.add_edges_from([(0,1), (1,2), (1,3), (0,3), (2,3)]) multiple_paths(G,0,1) > False multiple_paths(G,0,2) > False multiple_paths(G,0,3) > True for edge in G.edges_iter(): #let's do what you're trying to do if multiple_paths(G, edge[0], edge[1]): G.remove_edge(edge[0],edge[1]) G.edges() > [(0, 1), (1, 2), (2, 3)] 以查看是否还有路径。如果没有,则将边缘添加回来。

has_path
如果有任何边缘数据,你会要小心,我不喜欢删除边缘然后再添加它的可能性 - 这开启了一些困难的机会找到错误。