Question

这是标记的一个示例，但是我不能在标记之间获取文本，不能遍历标记，而不是节点<seg>中的node.text。这就是我要问的原因，欢迎所有的帮助（对不起我的英语）。

    <tuv>
         <seg>If you want to save items in a 
            <bpt i="1">&lt;Message id=&quot;Message:1T0000772343:f000012900ce8eb3:MPhS&quot;&gt;</bpt>
            <ept i="1">&lt;/Message&gt;</ept> 
            for which no connection has been established or in a 
            <bpt i="2">&lt;Message id=&quot;Message:1T0000772343:f000012900ceac3d:pvy4&quot;&gt;</bpt>
            <ept i="2">&lt;/Message&gt;</ept> 
            that requires authentication, you need to connect to the library.
         </seg>
   </tuv>

通缉输出：

如果要保存未建立连接的项目或需要身份验证的项目，则需要连接到库。

Answer 1

在.xpath("text()")元素上使用<seg>来获取所有文本节点。

此代码打印所需的输出：

from lxml import etree

root = etree.parse("tuv.xml")  
seg = root.find("seg")

# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))

# Print result with unwanted whitespace removed
print " ".join(text.split())

如何在xml中的标记之间获取文本，最好使用lxml

1 个答案: