这个想法是计算网络的弹性,作为形式的无向图表
{node: (set of its neighbors) for each node in the graph}
。
该函数逐个随机顺序从图中删除节点,并计算最大剩余连通组件的大小。
辅助函数bfs_visited()
返回仍然连接到给定节点的节点集
如何在Python 2中改进算法的实现?优选地,不改变辅助函数中的广度优先算法
def bfs_visited(graph, node):
"""undirected graph {Vertex: {neighbors}}
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph {Vertex: {neighbors}}
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components = []
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in
the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph {V: N}
returns a list whose k+1th entry is the size of the largest cc after
the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
答案 0 :(得分:0)
我从Gareth Rees那里得到了非常好的答案,完全涵盖了这个问题。
bfs_visited
的docstring应该解释节点参数。 compute_resilience
的docstring应该解释ugraph参数被修改。或者,该函数可以获取图形的副本,以便不修改原始图像。
在bfs_visited中的行:
queue = deque()
queue.append(node)
can be simplified to:
queue = deque([node])
函数largest_cc_size
构建了一对对象列表:
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
但是你可以看到它只使用每对的第一个元素(组件的大小)。因此,您可以通过不构建对来简化它:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]
由于只需要最大组件的大小,因此无需构建整个列表。相反,你可以使用max来找到最大的:
if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0
如果您使用的是Python 3.4或更高版本,可以使用默认参数max:
进一步简化return max(map(len, cc_visited(ugraph)), default=0)
现在这很简单,它可能不需要自己的功能。
这一行:
remaining_nodes = set(graph.keys())
可以写得更简单:
remaining_nodes = set(graph)
在sets_nodes集上有一个循环,在每次循环迭代中你更新remaining_nodes:
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
看起来好像代码的意图是通过从remaining_nodes中删除它们来避免迭代遍历被访问的节点,但是这不起作用!问题是for语句:
for node in remaining_nodes:
仅在循环开始时计算一次表达remaining_nodes。因此,当代码创建一个新集并将其分配给remaining_nodes时:
remaining_nodes = remaining_nodes - visited
这对正在迭代的节点没有影响。
您可以想象通过使用difference_update方法调整正在迭代的集合来尝试解决此问题:
remaining_nodes.difference_update(visited)
但这不是一个好主意,因为那时你会迭代一个集并在循环中修改它,这是不安全的。相反,您需要按如下方式编写循环:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)
使用while和pop是Python中用于在修改数据结构时使用数据结构的标准习惯用法 - 在bfs_visited中执行类似操作。
现在没有必要进行测试:
如果不是在connected_components中访问过: 因为每个组件只生产一次。
在compute_resilience中,第一行是:
res = [len(ugraph)]
但这仅在图表是单个连接组件时才有效。要处理一般情况,第一行应该是:
res = [largest_cc_size(ugraph)]
对于攻击顺序中的每个节点,compute_resilience调用:
res.append(largest_cc_size(ugraph))
但这并没有利用以前所做的工作。当我们从图中删除节点时,除了包含节点的连接组件外,所有连接的组件保持不变。因此,如果我们只对该组件进行广度优先搜索,而不是对整个图形进行搜索,那么我们可以节省一些工作。 (这实际上是否可以节省任何工作取决于图形的弹性。对于高弹性图形,它不会产生太大的差异。)
为了做到这一点,我们需要重新设计数据结构,以便我们可以有效地找到包含节点的组件,并有效地从组件集合中删除该组件。
这个答案已经很长了,所以我不会详细解释如何重新设计数据结构,我只是提出修改后的代码,让你自己解决。
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical = {}
components = {}
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = {node: set(neighbours) for node, neighbours in graph.items()}
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()
在修订后的代码中,max操作必须迭代所有剩余的连接组件,以便找到最大的组件。通过将连接的组件存储在优先级队列中,可以在组件数量的对数时间内找到最大的组件,从而提高此步骤的效率。
我怀疑算法的这一部分是实践中的瓶颈,所以它可能不值得额外的代码,但如果你需要这样做,那么Python文档中就有一些优先级队列实现注释。
效果比较 这是一个用于制作测试用例的有用函数:
来自itertools导入组合 来自随机导入随机
def random_graph(n,p): msgstr“”“返回一个随机的无向图,其中包含n个节点和每个边 独立于概率p。
"""
assert 0 <= p <= 1
graph = {i: set() for i in range(n)}
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
现在,修改和原始代码之间的快速性能比较。请注意,我们必须首先运行修订后的代码,因为原始代码会破坏性地修改图形,如上面的§1.2所述。
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
因此,在此测试用例中修改后的代码速度提高了约200倍。