Question

这个想法是计算网络的弹性，作为形式的无向图表 {node: (set of its neighbors) for each node in the graph}。该函数逐个随机顺序从图中删除节点，并计算最大剩余连通组件的大小。辅助函数bfs_visited()返回仍然连接到给定节点的节点集如何在Python 2中改进算法的实现？优选地，不改变辅助函数中的广度优先算法

def bfs_visited(graph, node):
    """undirected graph {Vertex: {neighbors}}
    Returns the set of all nodes visited by the algrorithm"""
    queue = deque()
    queue.append(node)
    visited = set([node])
    while queue:
        current_node = queue.popleft()
        for neighbor in graph[current_node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return visited

def cc_visited(graph):
    """ undirected graph {Vertex: {neighbors}}
    Returns a list of sets of connected components"""
    remaining_nodes = set(graph.keys())
    connected_components = []
    for node in remaining_nodes:
        visited = bfs_visited(graph, node)
        if visited not in connected_components:
            connected_components.append(visited)
        remaining_nodes = remaining_nodes - visited
        #print(node, remaining_nodes)
    return connected_components

def largest_cc_size(ugraph):
    """returns the size (an integer) of the largest connected component in 
    the ugraph."""
    if not ugraph:
        return 0
    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]

def compute_resilience(ugraph, attack_order):
    """
    input: a graph {V: N}

    returns a list whose k+1th entry is the size of the largest cc after 
    the removal of the first k nodes
    """
    res = [len(ugraph)]
    for node in attack_order:
        neighbors = ugraph[node]  
        for neighbor in neighbors:
            ugraph[neighbor].remove(node)
        ugraph.pop(node)
        res.append(largest_cc_size(ugraph))      
    return res

Answer 1

我从Gareth Rees那里得到了非常好的答案，完全涵盖了这个问题。

评分 bfs_visited的docstring应该解释节点参数。

compute_resilience的docstring应该解释ugraph参数被修改。或者，该函数可以获取图形的副本，以便不修改原始图像。

在bfs_visited中的行：

queue = deque()
queue.append(node)
can be simplified to:

queue = deque([node])

函数largest_cc_size构建了一对对象列表：

res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]

但是你可以看到它只使用每对的第一个元素（组件的大小）。因此，您可以通过不构建对来简化它：

res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]

由于只需要最大组件的大小，因此无需构建整个列表。相反，你可以使用max来找到最大的：

if ugraph:
    return max(map(len, cc_visited(ugraph)))
else:
    return 0

如果您使用的是Python 3.4或更高版本，可以使用默认参数max：

return max(map(len, cc_visited(ugraph)), default=0)

现在这很简单，它可能不需要自己的功能。

这一行：

remaining_nodes = set(graph.keys())

可以写得更简单：

remaining_nodes = set(graph)

在sets_nodes集上有一个循环，在每次循环迭代中你更新remaining_nodes：

for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
        connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited

看起来好像代码的意图是通过从remaining_nodes中删除它们来避免迭代遍历被访问的节点，但是这不起作用！问题是for语句：

for node in remaining_nodes:

仅在循环开始时计算一次表达remaining_nodes。因此，当代码创建一个新集并将其分配给remaining_nodes时：

remaining_nodes = remaining_nodes - visited

这对正在迭代的节点没有影响。

您可以想象通过使用difference_update方法调整正在迭代的集合来尝试解决此问题：

remaining_nodes.difference_update(visited)

但这不是一个好主意，因为那时你会迭代一个集并在循环中修改它，这是不安全的。相反，您需要按如下方式编写循环：

while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
        connected_components.append(visited)
    remaining_nodes.difference_update(visited)

使用while和pop是Python中用于在修改数据结构时使用数据结构的标准习惯用法 - 在bfs_visited中执行类似操作。

现在没有必要进行测试：

如果不是在connected_components中访问过：因为每个组件只生产一次。

在compute_resilience中，第一行是：

res = [len(ugraph)]

但这仅在图表是单个连接组件时才有效。要处理一般情况，第一行应该是：

res = [largest_cc_size(ugraph)]

对于攻击顺序中的每个节点，compute_resilience调用：

res.append(largest_cc_size(ugraph))

但这并没有利用以前所做的工作。当我们从图中删除节点时，除了包含节点的连接组件外，所有连接的组件保持不变。因此，如果我们只对该组件进行广度优先搜索，而不是对整个图形进行搜索，那么我们可以节省一些工作。（这实际上是否可以节省任何工作取决于图形的弹性。对于高弹性图形，它不会产生太大的差异。）

为了做到这一点，我们需要重新设计数据结构，以便我们可以有效地找到包含节点的组件，并有效地从组件集合中删除该组件。

这个答案已经很长了，所以我不会详细解释如何重新设计数据结构，我只是提出修改后的代码，让你自己解决。

def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
      component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical = {}
    components = {}
    while nodes:
        node = nodes.pop()
        component = bfs_visited(graph, node)
        components[node] = component
        nodes.difference_update(component)
        for n in component:
            canonical[n] = node
    return canonical, components

def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = {node: set(neighbours) for node, neighbours in graph.items()}

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
        # Find connected component containing node.
        component = components.pop(canonical.pop(node))

        # Remove node from graph.
        for neighbor in graph[node]:
            graph[neighbor].remove(node)
        graph.pop(node)
        component.remove(node)

        # Component may have been split by removal of node, so search
        # it for new connected components and update data structures
        # accordingly.
        canon, comp = connected_components(graph, component)
        canonical.update(canon)
        components.update(comp)
        yield largest()

在修订后的代码中，max操作必须迭代所有剩余的连接组件，以便找到最大的组件。通过将连接的组件存储在优先级队列中，可以在组件数量的对数时间内找到最大的组件，从而提高此步骤的效率。

我怀疑算法的这一部分是实践中的瓶颈，所以它可能不值得额外的代码，但如果你需要这样做，那么Python文档中就有一些优先级队列实现注释。

效果比较这是一个用于制作测试用例的有用函数：

来自itertools导入组合来自随机导入随机

def random_graph（n，p）： msgstr“”“返回一个随机的无向图，其中包含n个节点和每个边独立于概率p。
```
"""
assert 0 <= p <= 1
graph = {i: set() for i in range(n)}
for i, j in combinations(range(n), 2):
    if random() <= p:
        graph[i].add(j)
        graph[j].add(i)
return graph
```

现在，修改和原始代码之间的快速性能比较。请注意，我们必须首先运行修订后的代码，因为原始代码会破坏性地修改图形，如上面的§1.2所述。

>>> from timeit import timeit

>>> G = random_graph(300, 0.2)

>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334

>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434

因此，在此测试用例中修改后的代码速度提高了约200倍。

在Python中执行函数compute_resilience的速度非常慢

1 个答案: