Question

给出像这样的元素

<A>
    hello

    <annotation> NOT part of text </annotation>

    world
</A>

如何使用text()获取子文本节点（如XPath ElementTree）？

iter()和itertext()都是树步行者，包括所有后代节点。我并不知道有直接的子迭代器。另外，iter()只能找到元素（毕竟，它是ElementTree），所以不能用它来收集文本节点。

我知道有一个名为lxml的库提供了更好的XPath支持，但我在此之前要求添加另一个依赖项。（另外我对Python非常陌生，所以我可能会遗漏一些明显的东西。）

Answer 1

您会发现您的示例文本在三个属性中有点违反直觉：

A.text for＆＃34; hello＆＃34;
annotation.text for＆＃34; not not of text＆＃34;
annotation.tail for＆＃34; world＆＃34;

（省略空格）。这有点麻烦。但是，这些方面的内容应该有所帮助：

 import xml.etree.ElementTree as et

 xml = """
 <A>
     hello

     <annotation> NOT part of text </annotation>

     world
 </A>"""


 doc = et.fromstring(xml)


 def all_texts(root):
     if root.text is not None:
         yield root.text
     for child in root:
         if child.tail is not None:
             yield child.tail


 print list(all_texts(doc))

如何在ElementTree中迭代子文本节点（而不是后代）？

1 个答案: