Question

我正在尝试使用包含一些文本的节点解析XML文档，然后声明一个子节点，然后有更多的文本。例如，下面的XML中的第二个“post”元素：

<?xml version="1.0"?>
<data>
    <post>
        this is some text
    </post>
    <post>
        here is some more text
        <quote> and a nested node </quote>
        and more text after the nested node
    </post>
</data>

我使用以下代码尝试打印出每个节点的文本：

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for child in root:
    print (child.text)

但不幸的是，唯一的输出是：

this is some text
here is some more text

请注意，我遗漏了文字and more text after the nested node。

所以，

这是有效的XML吗？
如果是，我如何使用ElementTree或其他Python XML库来实现所需的解析？
如果不是，解析XML的任何建议都不足以编写我自己的解析器？

Answer 1

啊，在这里找到答案：How can I iterate child text nodes (not descendants) in ElementTree?

基本上我必须使用子节点的.tail属性来访问之前缺少的文本。

从子节点之后的XML节点中提取文本

1 个答案: