Question

我想使用lxml提取简单文本，例如

<tables>
    <entry> some text </entry>
    <entry> more text </entry>
</table>

现在我想要的输出应该是这样的一行

some text more text

我正在尝试的代码是：

from lxml import etree
f = open('doc.xml')
path = etree.parse(f)
f.close()
for text in doc.xpath('//entry/text()'):
    print text

像concatenate和string-join这样的函数没有完成这项工作请建议一些最简单的函数来提供我想要的输出。谢谢。

from lxml import etree
from StringIO import StringIO
f = open('doc.xml')
xml = f.read()
doc = etree.parse(StringIO(xml))
f.close()
for txt in doc.xpath('//tables/table/entry/text()'):
    print txt

我正在这样做，现在我很困惑在哪里使用findall（）。

Answer 1

这是一种方式：

from lxml import etree
from StringIO import StringIO

doc = etree.parse(StringIO("""
<tables>
    <table>
        <entry> some text </entry>
        <entry> more text </entry>
    </table>
    <table>
        <entry> further text </entry>
        <entry> even more text </entry>
    </table>
</tables>
"""))

for table in doc.findall('table'):
    line = ' '.join(entry.text.strip() for entry in table.findall('entry'))
    print(line)

输出：

some text more text
further text even more text

如何使用lxml从一行中的两个xml标记中提取文本

1 个答案: