假设我有一个我想要遍历的嵌套数据结构。此数据结构包含节点,而节点可以通过node.get_children_generator()
为其子节点提供。当然,这些孩子也是node
类型,并且以懒惰的方式进行评估,即由生成器枚举。为简单起见,我们假设node
没有子节点,函数get_children_generator
只返回一个空列表/生成器(因此我们不必手动检查它是空的)。
为了遍历嵌套节点的这种数据结构,简单地迭代链接所有生成器是一个好主意吗?那就是创建连锁链等等?或者这会产生太多的开销吗?
我想到的是以下内容:
import itertools as it
def traverse_nodes(start_node):
"""Traverses nodes in breadth first manner.
First returns the start node.
For simplicity we require that
there are no cycles in the data structure,
i.e. we are dealing with a simple tree.
"""
node_queue = iter([start_node])
while True:
try:
next_node = node_queue.next()
yield next_node
# Next get the children
child_gen = next_node.get_children_generator()
# The next code line is the one I am worried about
# is it a good idea to make a chain of chains?
node_queue = it.chain(node_queue, child_gen)
except StopIteration:
# There are no more nodes
break
第node_queue = it.chain(node_queue, child_gen)
行是否适合进行遍历?制作一系列连锁链等是一个好主意吗?
这样你实际上可以执行一些东西,这里是一个相当愚蠢的虚拟node
类。生成器有点无用,但是假设在现实世界中评估孩子的例子有点贵,而且确实需要生成器。
class Node(object):
"""Rather silly example of a nested node.
The children are actually stored in a list,
so the generator is actually not needed.
But simply assume that returning a children
requires a lazy evaluation.
"""
counter = 0 # Counter for node identification
def __init__(self):
self.children = [] # children list
self.node_number = Node.counter # identifies the node
Node.counter += 1
def __repr__(self):
return 'I am node #%d' % self.node_number
def get_children_generator(self):
"""Returns a generator over children"""
return (x for x in self.children)
以下代码片段
node0 = Node()
node1 = Node()
node2 = Node()
node3 = Node()
node4 = Node()
node5 = Node()
node6 = Node()
node0.children = [node1, node2]
node1.children = [node6]
node2.children = [node3, node5]
node3.children = [node4]
for node in traverse_nodes(node0):
print(node)
打印
我是节点#0
我是节点#1
我是节点#2
我是节点#6
我是节点#3
我是节点#5
我是节点#4
答案 0 :(得分:3)
链接多个chain
会导致递归函数调用与链接在一起的chain
量成比例的开销。
首先,我们的纯python chain
实现,以便我们不会丢失堆栈信息。 C实现是here,您可以看到它基本上做同样的事情 - 在底层迭代上调用next()
。
from inspect import stack
def chain(it1, it2):
for collection in [it1, it2]:
try:
for el in collection:
yield el
except StopIteration:
pass
我们只关心chain
的2次迭代版本。我们先消耗第一个可迭代,然后消耗另一个。
class VerboseListIterator(object):
def __init__(self, collection, node):
self.collection = collection
self.node = node
self.idx = 0
def __iter__(self):
return self
def __next__(self):
print('Printing {}th child of "{}". Stack size: {}'.format(self.idx, self.node, len(stack())))
if self.idx >= len(self.collection):
raise StopIteration()
self.idx += 1
return self.collection[self.idx - 1]
这是我们方便的列表迭代器,它将告诉我们在返回包装列表的下一个元素时有多少堆栈帧。
class Node(object):
"""Rather silly example of a nested node.
The children are actually stored in a list,
so the generator is actually not needed.
But simply assume that returning a children
requires a lazy evaluation.
"""
counter = 0 # Counter for node identification
def __init__(self):
self.children = [] # children list
self.node_number = Node.counter # identifies the node
Node.counter += 1
def __repr__(self):
return 'I am node #%d' % self.node_number
def get_children_generator(self):
"""Returns a generator over children"""
return VerboseListIterator(self.children, self)
def traverse_nodes(start_node):
"""Traverses nodes in breadth first manner.
First returns the start node.
For simplicity we require that
there are no cycles in the data structure,
i.e. we are dealing with a simple tree.
"""
node_queue = iter([start_node])
while True:
try:
next_node = next(node_queue)
yield next_node
# Next get the children
child_gen = next_node.get_children_generator()
# The next code line is the one I am worried about
# is it a good idea to make a chain of chains?
node_queue = chain(node_queue, child_gen)
except StopIteration:
# There are no more nodes
break
这些是您使用的Python版本(3.4)的实现。
nodes = [Node() for _ in range(10)]
nodes[0].children = nodes[1:6]
nodes[1].children = [nodes[6]]
nodes[2].children = [nodes[7]]
nodes[3].children = [nodes[8]]
nodes[4].children = [nodes[9]]
节点'图表初始化。根连接到前5个节点,这些节点又连接到i + 5
节点。
for node in traverse_nodes(nodes[0]):
print(node)
这种互动的结果如下:
I am node #0
Printing 0th child of "I am node #0". Stack size: 4
I am node #1
Printing 1th child of "I am node #0". Stack size: 5
I am node #2
Printing 2th child of "I am node #0". Stack size: 6
I am node #3
Printing 3th child of "I am node #0". Stack size: 7
I am node #4
Printing 4th child of "I am node #0". Stack size: 8
I am node #5
Printing 5th child of "I am node #0". Stack size: 9
Printing 0th child of "I am node #1". Stack size: 8
I am node #6
Printing 1th child of "I am node #1". Stack size: 9
Printing 0th child of "I am node #2". Stack size: 8
I am node #7
Printing 1th child of "I am node #2". Stack size: 9
Printing 0th child of "I am node #3". Stack size: 8
I am node #8
Printing 1th child of "I am node #3". Stack size: 9
Printing 0th child of "I am node #4". Stack size: 8
I am node #9
Printing 1th child of "I am node #4". Stack size: 9
Printing 0th child of "I am node #5". Stack size: 8
Printing 0th child of "I am node #6". Stack size: 7
Printing 0th child of "I am node #7". Stack size: 6
Printing 0th child of "I am node #8". Stack size: 5
Printing 0th child of "I am node #9". Stack size: 4
正如您所看到的,我们越接近node0
子列表的末尾,堆栈就越大。这是为什么?让我们仔细看看每个步骤 - 列出每个chain
来电以澄清:
node_queue = [node0]
next(node_queue)
,产生了node0
。 node_queue = chain1([node0], [node1, node2, node3, node4, node5])
。next(node_queue)
。消耗列表[node0]
,并消耗第二个列表。得到node1
,node_queue = chain2(chain1([node0], [node1, ...]), [node6])
。next(node_queue)
传播到chain1
(来自chain2
),并产生node2
。 node_queue = chain3(chain2(chain1([node0], [...]), [node6]), [node7])
。当我们即将产生node5
:
next(chain5(chain4, [node9]))
|
V
next(chain4(chain3, [node8]))
|
V
next(chain3(chain2, [node7]))
|
V
next(chain2(chain1, [node6]))
|
V
next(chain1([node0], [node1, node2, node3, node4, node5]))
^
yield
而不是。单个next(node_queue)
调用实际上可以导致大量递归调用,与每个BFS中常规迭代器队列的大小成比例,或者用简单的单词 - 图中节点的最大子节点数。
这是显示算法的gif:http://i.imgur.com/hnPIVG4.gif