在Python中执行函数compute_resilience的速度非常慢

时间:2018-01-01 18:48:22

标签: python algorithm

这个想法是计算网络的弹性,作为形式的无向图表     {node: (set of its neighbors) for each node in the graph}。 该函数逐个随机顺序从图中删除节点,并计算最大剩余连通组件的大小。 辅助函数bfs_visited()返回仍然连接到给定节点的节点集 如何在Python 2中改进算法的实现?优选地,不改变辅助函数中的广度优先算法

def bfs_visited(graph, node):
    """undirected graph {Vertex: {neighbors}}
    Returns the set of all nodes visited by the algrorithm"""
    queue = deque()
    queue.append(node)
    visited = set([node])
    while queue:
        current_node = queue.popleft()
        for neighbor in graph[current_node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return visited

def cc_visited(graph):
    """ undirected graph {Vertex: {neighbors}}
    Returns a list of sets of connected components"""
    remaining_nodes = set(graph.keys())
    connected_components = []
    for node in remaining_nodes:
        visited = bfs_visited(graph, node)
        if visited not in connected_components:
            connected_components.append(visited)
        remaining_nodes = remaining_nodes - visited
        #print(node, remaining_nodes)
    return connected_components

def largest_cc_size(ugraph):
    """returns the size (an integer) of the largest connected component in 
    the ugraph."""
    if not ugraph:
        return 0
    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]

def compute_resilience(ugraph, attack_order):
    """
    input: a graph {V: N}

    returns a list whose k+1th entry is the size of the largest cc after 
    the removal of the first k nodes
    """
    res = [len(ugraph)]
    for node in attack_order:
        neighbors = ugraph[node]  
        for neighbor in neighbors:
            ugraph[neighbor].remove(node)
        ugraph.pop(node)
        res.append(largest_cc_size(ugraph))      
    return res

1 个答案:

答案 0 :(得分:0)

我从Gareth Rees那里得到了非常好的答案,完全涵盖了这个问题。

  1. 评分 bfs_visited的docstring应该解释节点参数。
  2. compute_resilience的docstring应该解释ugraph参数被修改。或者,该函数可以获取图形的副本,以便不修改原始图像。

    在bfs_visited中的行:

    queue = deque()
    queue.append(node)
    can be simplified to:
    
    queue = deque([node])    
    

    函数largest_cc_size构建了一对对象列表:

    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]
    

    但是你可以看到它只使用每对的第一个元素(组件的大小)。因此,您可以通过不构建对来简化它:

    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]
    

    由于只需要最大组件的大小,因此无需构建整个列表。相反,你可以使用max来找到最大的:

    if ugraph:
        return max(map(len, cc_visited(ugraph)))
    else:
        return 0
    

    如果您使用的是Python 3.4或更高版本,可以使用默认参数max:

    进一步简化
    return max(map(len, cc_visited(ugraph)), default=0)
    

    现在这很简单,它可能不需要自己的功能。

    这一行:

    remaining_nodes = set(graph.keys())
    

    可以写得更简单:

    remaining_nodes = set(graph)
    

    在sets_nodes集上有一个循环,在每次循环迭代中你更新remaining_nodes:

    for node in remaining_nodes:
        visited = bfs_visited(graph, node)
        if visited not in connected_components:
            connected_components.append(visited)
        remaining_nodes = remaining_nodes - visited
    

    看起来好像代码的意图是通过从remaining_nodes中删除它们来避免迭代遍历被访问的节点,但是这不起作用!问题是for语句:

    for node in remaining_nodes:
    

    仅在循环开始时计算一次表达remaining_nodes。因此,当代码创建一个新集并将其分配给remaining_nodes时:

    remaining_nodes = remaining_nodes - visited
    

    这对正在迭代的节点没有影响。

    您可以想象通过使用difference_update方法调整正在迭代的集合来尝试解决此问题:

    remaining_nodes.difference_update(visited)
    

    但这不是一个好主意,因为那时你会迭代一个集并在循环中修改它,这是不安全的。相反,您需要按如下方式编写循环:

    while remaining_nodes:
        node = remaining_nodes.pop()
        visited = bfs_visited(graph, node)
        if visited not in connected_components:
            connected_components.append(visited)
        remaining_nodes.difference_update(visited)
    

    使用while和pop是Python中用于在修改数据结构时使用数据结构的标准习惯用法 - 在bfs_visited中执行类似操作。

    现在没有必要进行测试:

    如果不是在connected_components中访问过: 因为每个组件只生产一次。

    在compute_resilience中,第一行是:

    res = [len(ugraph)]
    

    但这仅在图表是单个连接组件时才有效。要处理一般情况,第一行应该是:

    res = [largest_cc_size(ugraph)]
    

    对于攻击顺序中的每个节点,compute_resilience调用:

    res.append(largest_cc_size(ugraph))
    

    但这并没有利用以前所做的工作。当我们从图中删除节点时,除了包含节点的连接组件外,所有连接的组件保持不变。因此,如果我们只对该组件进行广度优先搜索,而不是对整个图形进行搜索,那么我们可以节省一些工作。 (这实际上是否可以节省任何工作取决于图形的弹性。对于高弹性图形,它不会产生太大的差异。)

    为了做到这一点,我们需要重新设计数据结构,以便我们可以有效地找到包含节点的组件,并有效地从组件集合中删除该组件。

    这个答案已经很长了,所以我不会详细解释如何重新设计数据结构,我只是提出修改后的代码,让你自己解决。

    def connected_components(graph, nodes):
        """Given an undirected graph represented as a mapping from nodes to
        the set of their neighbours, and a set of nodes, find the
        connected components in the graph containing those nodes.
    
        Returns:
        - mapping from nodes to the canonical node of the connected
          component they belong to
        - mapping from canonical nodes to connected components
    
        """
        canonical = {}
        components = {}
        while nodes:
            node = nodes.pop()
            component = bfs_visited(graph, node)
            components[node] = component
            nodes.difference_update(component)
            for n in component:
                canonical[n] = node
        return canonical, components
    
    def resilience(graph, attack_order):
        """Given an undirected graph represented as a mapping from nodes to
        an iterable of their neighbours, and an iterable of nodes, generate
        integers such that the the k-th result is the size of the largest
        connected component after the removal of the first k-1 nodes.
    
        """
        # Take a copy of the graph so that we can destructively modify it.
        graph = {node: set(neighbours) for node, neighbours in graph.items()}
    
        canonical, components = connected_components(graph, set(graph))
        largest = lambda: max(map(len, components.values()), default=0)
        yield largest()
        for node in attack_order:
            # Find connected component containing node.
            component = components.pop(canonical.pop(node))
    
            # Remove node from graph.
            for neighbor in graph[node]:
                graph[neighbor].remove(node)
            graph.pop(node)
            component.remove(node)
    
            # Component may have been split by removal of node, so search
            # it for new connected components and update data structures
            # accordingly.
            canon, comp = connected_components(graph, component)
            canonical.update(canon)
            components.update(comp)
            yield largest()
    

    在修订后的代码中,max操作必须迭代所有剩余的连接组件,以便找到最大的组件。通过将连接的组件存储在优先级队列中,可以在组件数量的对数时间内找到最大的组件,从而提高此步骤的效率。

    我怀疑算法的这一部分是实践中的瓶颈,所以它可能不值得额外的代码,但如果你需要这样做,那么Python文档中就有一些优先级队列实现注释。

    1. 效果比较 这是一个用于制作测试用例的有用函数:

      来自itertools导入组合 来自随机导入随机

      def random_graph(n,p):     msgstr“”“返回一个随机的无向图,其中包含n个节点和每个边     独立于概率p。

      """
      assert 0 <= p <= 1
      graph = {i: set() for i in range(n)}
      for i, j in combinations(range(n), 2):
          if random() <= p:
              graph[i].add(j)
              graph[j].add(i)
      return graph
      
    2. 现在,修改和原始代码之间的快速性能比较。请注意,我们必须首先运行修订后的代码,因为原始代码会破坏性地修改图形,如上面的§1.2所述。

      >>> from timeit import timeit
      
      >>> G = random_graph(300, 0.2)
      
      >>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
      0.28782312001567334
      
      >>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
      59.46968446299434
      

      因此,在此测试用例中修改后的代码速度提高了约200倍。