如何在Python中将多个XML节点复制到另一个文件

时间:2014-01-16 21:14:06

标签: python xml xpath elementtree

请记住,我是Python的新手。我试图将一些XML节点从sample1.xml复制到out.xml,如果它在sample2.xml中不存在。

这是我在被困之前已经走了多远

import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='sample1.xml')
addtree = ET.ElementTree(file='sample2.xml')

root = tree.getroot()
addroot = addtree.getroot()

for adel in addroot.findall('.//cars/car'):
    for el in root.findall('cars/car'):
        with open('out.xml', 'w+') as f:
            f.write("BEFORE\n")    
            f.write(el.tag)
            f.write("\n")
            f.write(adel.tag)
            f.write("\n")
            f.write("\n")

            f.write("AFTER\n")

            el = adel

            f.write(el.tag)
            f.write("\n")
            f.write(adel.tag)

我不知道我错过了什么,但它只复制了实际的“tag”本身。

输出:

BEFORE
car
car

AFTER
car
car

所以我错过了子节点,还有<></>标签。预期结果如下。

sample1.xml:

<cars>
    <car>
        <use-car>0</use-car>
        <use-gas>0</use-gas>
        <car-name />
        <car-key />
        <car-location>hawaii</car-location>
        <car-port>5</car-port>
    </car>
</cars>

sample2.xml:

<cars>
    <old>
        1
    </old>
    <new>
        8
    </new>
    <car />
</cars>

out.xml中的预期结果(最终产品)

<cars>
    <old>
        1
    </old>
    <new>
        8
    </old>
    <car>
        <use-car>0</use-car>
        <use-gas>0</use-gas>
        <car-name />
        <car-key />
        <car-location>hawaii</car-location>
        <car-port>5</car-port>
    </car>
</cars>

所有其他节点oldnew必须保持不变。我只是试图将<car />替换为所有子节点和孙子节点(如果存在的话)。

1 个答案:

答案 0 :(得分:3)

首先,您的XML有几个小问题:

  • sample1 :结束cars代码缺少/
  • sample2 :结束new标记错误地显示old,应阅读new

第二,免责声明:我的解决方案有其局限性 - 特别是,它不会反复将car节点从 sample1 替换为多个 sample2 中的斑点。但它适用于您提供的示例文件。

第三次:感谢access ElementTree node parent node上的前几个答案 - 他们告知了下面get_node_parent_info的实施情况。

最后,代码:

import xml.etree.ElementTree as ET

def find_child(node, with_name):
    """Recursively find node with given name"""
    for element in list(node):
        if element.tag == with_name:
            return element
        elif list(element):
            sub_result = find_child(element, with_name)
            if sub_result is not None:
                return sub_result
    return None

def replace_node(from_tree, to_tree, node_name):
    """
    Replace node with given node_name in to_tree with
    the same-named node from the from_tree
    """
    # Find nodes of given name ('car' in the example) in each tree
    from_node = find_child(from_tree.getroot(), node_name)
    to_node = find_child(to_tree.getroot(), node_name)

    # Find where to substitute the from_node into the to_tree
    to_parent, to_index = get_node_parent_info(to_tree, to_node)

    # Replace to_node with from_node
    to_parent.remove(to_node)
    to_parent.insert(to_index, from_node)

def get_node_parent_info(tree, node):
    """
    Return tuple of (parent, index) where:
        parent = node's parent within tree
        index = index of node under parent
    """
    parent_map = {c:p for p in tree.iter() for c in p}
    parent = parent_map[node]
    return parent, list(parent).index(node)

from_tree = ET.ElementTree(file='sample1.xml')
to_tree = ET.ElementTree(file='sample2.xml')

replace_node(from_tree, to_tree, 'car')

# ET.dump(to_tree)
to_tree.write('output.xml')

更新:最近我注意到,如果所讨论的“孩子”不在第一个分支中,那么我最初提供的解决方案中find_child()的实施将会失败遍历的XML树。我已经更新了上面的实现来纠正这个问题。