使用标记特定信息的Python XML解析

时间:2014-12-26 05:15:10

标签: python xml parsing tree

我有一个XML文件,结构如下:

<?xml version="1.0">
<title>
  <ch bk="Book1" num="1">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
  <ch bk="Book1" num="2">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
  <ch bk="Book2" num="1">
    <ver num="1">ver1 content</ver>
    <ver num="2">ver2 content</ver>
  </ch>
</title>

我有没有办法在python中访问特定ver的个人ch num and book内容? (例如,访问ver num = 2 of ch num = 2 of bk = Book1) 我看了几个解析XML的xml模块类,但是它们是通过tagName来的,我没有看到我可以在哪里输入信息,例如num,bk和ch。 非常感谢!

4 个答案:

答案 0 :(得分:1)

您可以访问具有精细表达式的元素:

XPath解释

//代表亲戚&amp;递归搜索

'//ch[@num="1"][@bk="Book1"]/ver[@num="1"]'
#  ^      ^        ^           ^      ^
# ch node |        |           |      |
#  + attributes num = 1        |      |
#  + AND Book attribute = 1    |      |
#                           ver node  |
#                           + num attribut = 1

python代码:

from lxml import etree
fp = open("/tmp/xml.xml")
tree = etree.parse(fp)
print(tree.xpath('//ch[@num="1"][@bk="Book1"]/ver[@num="1"]/text()')[0])

答案 1 :(得分:1)

使用Python标准库xml.etree.ElementTree的简单方法:

import xml.etree.ElementTree as ET
tree = ET.parse('yourfile.xml')

def locate(chnum, bk, vernum):
    for ch in tree.findall('ch'):
        if ch.get('num') != chnum: continue
        if ch.get('bk') != bk: continue
        for ver in ch.findall('ver'):
            if ver.get('num') != vernum: continue
            return ver.text
    return None  # no such chapter/book/version combo found

答案 2 :(得分:1)

是的,您可以使用xpath获取目标标记。

>>> from lxml import etree
>>> fp = open("test.html")
>>> tree = etree.parse(fp)
>>> r = tree.xpath('//ch[@num=2][@bk="Book1"]/ver/text()')
>>> r
['ver1 content', 'ver2 content']

答案 3 :(得分:1)

这样,您可以访问ver作为元素。

import xml.etree.ElementTree as etree

tree = etree.ElementTree(file='input.xml')

#inputs
num = '1'
bk = 'Book2'

#list comprehension (assume num and bk is unique for ch)
vers =  [ch.findall('ver') \
         for ch in tree.findall('ch') \
         if ch.attrib['num'] == num and ch.attrib['bk'] == bk][0]

#loop results
for ver in vers:
    print 'num={0} text={1}'.format(ver.attrib['num'], ver.text)