我有以下代码用于处理XML文件:
for el in root:
checkChild(rootDict, el)
for child in el:
checkChild(rootDict, el, child)
for grandchild in child:
checkChild(rootDict, el, child, grandchild)
for grandgrandchild in grandchild:
checkChild(rootDict, el, child, grandchild, grandgrandchild)
...
...
如您所见,在每次迭代中,我只是使用一个额外的参数调用同一函数。有什么方法可以避免编写太多嵌套的for循环,这些循环基本上可以完成相同的工作?
任何帮助将不胜感激。谢谢。
答案 0 :(得分:0)
假设root
来自ElemenTree解析,则可以创建一个数据结构,其中包含每个节点的所有祖先的列表,然后通过cnd对其进行迭代以调用checkChild:
def checkChild(*element_chain):
# Code placeholder
print("Checking %s" % '.'.join(t.tag for t in reversed(element_chain)))
tree = ET.fromstring(xml)
# Build a dict containing each node and its ancestors
nodes_and_parents = {}
for elem in tree.iter(): # tree.iter yields every tag in the XML, not only the root childs
for child in elem:
nodes_and_parents[child] = [elem, ] + nodes_and_parents.get(elem, [])
for t, parents in nodes_and_parents.items():
checkChild(t, *parents)
答案 1 :(得分:0)
def recurse(tree):
"""Walks a tree depth-first and yields the path at every step."""
# We convert the tree to a list of paths through it,
# with the most recently visited path last. This is the stack.
def explore(stack):
try:
# Popping from the stack means reading the most recently
# discovered but yet unexplored path in the tree. We yield it
# so you can call your method on it.
path = stack.pop()
except IndexError:
# The stack is empty. We're done.
return
yield path
# Then we expand this path further, adding all extended paths to the
# stack. In reversed order so the first child element will end up at
# the end, and thus will be yielded first.
stack.extend(path + (elm,) for elm in reversed(path[-1]))
yield from explore([(tree,)])
# The linear structure yields tuples (root, child, ...)
linear = recurse(root)
# Then call checkChild(rootDict, child, ...)
next(linear) # skip checkChild(rootDict)
for path in linear:
checkChild(rootDict, *path[1:])
为您的理解,假设根看起来像这样:
root
child1
sub1
sub2
child2
sub3
subsub1
sub4
child3
那就像一棵树。我们可以找到穿过这棵树的一些路径,例如(root, child1)
。当您将这些路径提供给checkChild
时,将导致调用checkChild(rootNode, child1)
。最终,对于树中的每个路径,checkChild
都将被完全调用一次。因此,我们可以将树写为路径列表,如下所示:
[(root,),
(root, child1),
(root, child1, sub1),
(root, child1, sub2),
(root, child2),
(root, child2, sub3),
(root, child2, sub3, subsub1),
(root, child2, sub4),
(root, child3)]
此列表中路径的 order 恰好与您的循环结构匹配。它称为深度优先。 (另一个排序顺序, breadth-first ,将首先列出所有子节点,然后列出所有子节点,最后列出所有子子节点。)
上面的列表与代码中的stack
变量相同,但有一个小的变化,即stack
仅存储了需要记住的最小路径。
最后,recurse
一步一步地产生了这些路径,最后一部分代码像您在问题中一样调用了checkChild
方法。
答案 2 :(得分:0)
无论您希望对文件和目录执行什么操作,都可以遍历它们。在python中,我知道的最简单的方法是:
#!/usr/bin/env python
import os
# Set the directory you want to start from
root_dir = '.'
for dir_name, subdirList, file_list in os.walk(root_dir):
print(f'Found directory: {dir_name}s')
for file_name in file_list:
print(f'\t{file_name}s')
遍历时,您可以将添加到组或执行其他操作