这是标记的一个示例,但是我不能在标记之间获取文本,不能遍历标记,而不是节点<seg>
中的node.text。这就是我要问的原因,欢迎所有的帮助(对不起我的英语)。
<tuv>
<seg>If you want to save items in a
<bpt i="1"><Message id="Message:1T0000772343:f000012900ce8eb3:MPhS"></bpt>
<ept i="1"></Message></ept>
for which no connection has been established or in a
<bpt i="2"><Message id="Message:1T0000772343:f000012900ceac3d:pvy4"></bpt>
<ept i="2"></Message></ept>
that requires authentication, you need to connect to the library.
</seg>
</tuv>
通缉输出:
如果要保存未建立连接的项目或需要身份验证的项目,则需要连接到库。
答案 0 :(得分:1)
在.xpath("text()")
元素上使用<seg>
来获取所有文本节点。
此代码打印所需的输出:
from lxml import etree
root = etree.parse("tuv.xml")
seg = root.find("seg")
# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))
# Print result with unwanted whitespace removed
print " ".join(text.split())