Question

我正在尝试将几个.xml文件合并到一个文件中，除了我遇到嵌套标记的问题外，这个文件工作得很好。我使用python脚本执行以下操作：读取目录中的所有.xml文件，提取某些标记的所有元素（使用getElementsByTagName）并将其放在单独的列表中，然后将这些列表存储在1个合并文件中。我遇到嵌套标签的问题。例如：

File1.xml:
<SomeTag>
  <content>Text1</content>
</SomeTag>
<OtherTag>
  <value>Val1</value>
</OtherTag>

与：

合并

File2.xml:
<SomeTag>
  <content>Text2</content>
</SomeTag>
<OtherTag>
  <OtherTag>
    <value>Val2</value>
    <element>Elem1</element>
  </OtherTag>
</OtherTag>

我希望得到：

<SomeTag>                                #container tag created in script
  <content>Text1</content>
  <content>Text2</content>
</SomeTag>
<OtherTag>                               #container tag created in script
  <value>Val1</value>
  <OtherTag>
    <value>Val2</value>
    <element>Elem1</element>
  </OtherTag>
</OtherTag>

但我得到的是：

<SomeTag>                                #container tag created in script
  <content>Text1</content>
  <content>Text2</content>
</SomeTag>
<OtherTag>                               #container tag created in script
  <OtherTag>
  </OtherTag>
  <OtherTag>
    <value>Val1</value>
    <value>Val2</value>
    <element>Elem1</element>
  </OtherTag>
</OtherTag>

我想我想要的是getElementsByTagName只读取第一顺序深度，而不是递归遍历整个（Element）树xml结构。有没有人有任何想法？

Answer 1

正如Lagada所提到的，getElementsByTagName将为您提供给定类型元素的所有，包括嵌套的元素。你可以自己走树，不要在你已经得到的树下进一步下降;但也许用getElementsByTagName将它们全部搞定更容易，然后遍历它们，并丢弃任何具有相同类型祖先的东西（当然，那些是嵌套的祖先）。然后处理其余部分。

在Python中，我可以使用getElementsByTagName但保留嵌套结构吗？

1 个答案: