如何在XML文件中合并两个不同的路径?

时间:2015-06-12 13:15:00

标签: python xml parsing xml-parsing lxml

这是我的xml文件:

   <File>
        <Paths>
                <Path>
                   <Node>
                      <NodeName>Initial_Node</NodeName>
                      <InnerNode>
                         <Signal>Test_sig</Signal>
                         <InnerNode>
                            <Signal>Test_sig_1</Signal>
                            <NodeRef>Ref0</NodeRef>
                         </InnerNode>
                      </InnerNode>
                   </Node>
                </Path>
                <Path>
                   <Node>
                      <NodeName>Name1</NodeName>
                      <InnerNode>
                         <Signal>Test_sig_0</Signal>
                         <InnerNode>
                            <Signal>Test_sig_2</Signal>
                            <NodeRef>Ref1</NodeRef>
                         </InnerNode>
                      </InnerNode>
                   </Node>
                </Path>
        </Paths>
        <Paths>
                <Path>
                   <Node>
                      <NodeRef>Ref0</NodeRef>
                      <InnerNode>
                         <Signal>Test_sig_3</Signal>
                         <InnerNode>
                            <Signal>Test_sig_4</Signal>
                            <NodeName>Final_Node</NodeName>
                         </InnerNode>
                      </InnerNode>
                   </Node>
                </Path>
        </Paths>
    </File>

我在Python中使用lxml。 我希望能够在上面的文件中附加匹配的<NodeRef>,然后将两个匹配路径的其余部分合并在一起以获得以下结果:

 <File>
        <Paths>
                <Path>
                   <Node>
                      <NodeName>Initial_Node</NodeName>
                      <InnerNode>
                         <Signal>Test_sig</Signal>
                             <InnerNode>
                                <Signal>Test_sig_1</Signal>
                                    <InnerNode>
                                        <Signal>Test_sig_3</Signal>
                                        <InnerNode>
                                            <Signal>Test_sig_4</Signal>
                                            <NodeName>Final_Node</NodeName>
                                        </InnerNode>
                                    </InnerNode>
                             </InnerNode>
                      </InnerNode>
                   </Node>
                </Path>
                <Path>
                   <Node>
                      <NodeName>Name1</NodeName>
                      <InnerNode>
                         <Signal>Test_sig_0</Signal>
                         <InnerNode>
                            <Signal>Test_sig_2</Signal>
                            <NodeRef>Ref1</NodeRef>
                         </InnerNode>
                      </InnerNode>
                   </Node>
                </Path>
        </Paths>
    </File>

非常感谢你的帮助

1 个答案:

答案 0 :(得分:1)

所以这里没有太多的细节,但这至少给出了正确的输出:

from lxml import etree
root = etree.fromstring(xml)

replace_set = {}
for node in root.iter("Node"):
    if 'NodeRef' in [c.tag for c in node]:
        # This is a <Node> type with child element <NodeRef>. So it will
        # be referenced by a <Node> with <NodeName>. Let's keep it, and then
        # remove it from the tree.
        ref = node.find("NodeRef").text
        inner = node.find("InnerNode")
        replace_set[ref] = inner
        # Remove NodeRef element, as we've saved it in dict
        node.getparent().remove(node)

# Cleanup where we've removed NodeRefs.
for node in root.iter("Paths"):
    if len(node.find("Path")) == 0:
        node.getparent().remove(node)

# Replace references to NodeRefs
for node in root.iter("NodeRef"):
    if node.text in replace_set:
        node.getparent().replace(node, replace_set[ref])

print etree.tostring(root)