使用python解析xml,使用兄弟标记作为选择器选择标记

时间:2016-03-23 01:43:34

标签: python xml xml-parsing

来自以下xml结构并使用ElementTree我试图解析描述'text ,用于标题文本包含某个感兴趣关键字的项目。谢谢你的任何建议

<data>
  <item>
      <title>contains KEYWORD of interest </title>
      <description> description text of interest "1"</description>
  </item>
  <item>
      <title>title text </title>
      <description> description text not of interest</description>
  </item>
  .
  .
  .
  <item>
      <title>also contains KEYWORD of interest </title>
      <description> description text of interest "k" </description>
  </item>
</data>

期望的结果:

感兴趣的描述文字“1”

感兴趣的描述文字“k”

1 个答案:

答案 0 :(得分:1)

您可以使用支持lxmlXPath

xml = '''<data>
  <item>
      <title>contains KEYWORD of interest </title>
      <description> description text of interest "1"</description>
  </item>
  <item>
      <title>title text </title>
      <description> description text not of interest</description>
  </item>
  .
  .
  .
  <item>
      <title>also contains KEYWORD of interest </title>
      <description> description text of interest "k" </description>
  </item>
</data>
'''

import lxml.etree
root = lxml.etree.fromstring(xml)
root.xpath('.//title[contains(text(), "KEYWORD")]/'
           'following-sibling::description/text()')
# => [' description text of interest "1"', ' description text of interest "k" ']

使用xml.etree.ElementTree

import xml.etree.ElementTree as ET                                             
root = ET.fromstring(xml)
[item.find('description').text for item in root.iter('item')
 if'KEYWORD' in item.find('title').text]
# => [' description text of interest "1"', ' description text of interest "k" ']