使用基于文本字符串的lxml解析XML文件

时间:2016-06-08 14:14:28

标签: python python-3.x lxml

我有一个XML文件,我想基于字符串来检索元素的 text 属性。

在下面的示例中,我想找到包含字符串 home (两个元素)的所有主题元素。获得元素后,我可以检索 text 值。

<?xml version="1.0" ?>
<zAppointments reminder="15">
    <appointment>
        <subject>Bring pizza home</subject>
        <shape>circule</shape>
    </appointment>
    <appointment>
        <subject>Bring hamburger home</subject>
        <shape>box</shape>
    </appointment>
    <appointment>
        <subject>Bring banana homes</subject>
    </appointment>
    <appointment>
        <subject>Check MS Office website for updates</subject>
  </appointment>
</zAppointments>

1 个答案:

答案 0 :(得分:2)

使用contains() XPath函数:

//subject[contains(., 'home')]/text()

演示:

>>> import lxml.etree as ET
>>>
>>> data = """<?xml version="1.0" ?>
... <zAppointments reminder="15">
...     <appointment>
...         <subject>Bring pizza home</subject>
...     </appointment>
...     <appointment>
...         <subject>Bring hamburger home</subject>
...     </appointment>
...     <appointment>
...         <subject>Check MS Office website for updates</subject>
...   </appointment>
... </zAppointments>"""
>>> root = ET.fromstring(data)
>>> root.xpath("//subject[contains(., 'home')]/text()")
['Bring pizza home', 'Bring hamburger home']