使用etree在xml中查找节点值

时间:2014-03-19 13:24:17

标签: python xml xml-parsing lxml elementtree

这是xml

<organizations>
    <organization>
        <orgID>152</orgID>
        <orgName>This is A</orgName>
    </organization>
<organization>
        <orgID>1352</orgID>
        <orgName>This is B</orgName>
    </organization>
    <organization>
        <orgID>1522</orgID>
        <orgName>This is C</orgName>
    </organization>
    <organization>
        <orgID>1512</orgID>
        <orgName>This is D</orgName>
    </organization>
</organizations>

我想要orgName使用orgID

我试过了,

import urllib
import lxml.etree as ET
url='url here'
xmldata = urllib.urlopen(url).read()
root = ET.fromstring(xmldata)
for target in root.xpath('.//organization/orgID[text()="152"]'):
    print target

但没有打印。

我在这里做错了什么?

2 个答案:

答案 0 :(得分:1)

一种选择是检查后代的文字:

from lxml import etree as ET


data = """<organizations>
    <organization>
        <orgID>152</orgID>
        <orgName>This is A</orgName>
    </organization>
<organization>
        <orgID>1352</orgID>
        <orgName>This is B</orgName>
    </organization>
    <organization>
        <orgID>1522</orgID>
        <orgName>This is C</orgName>
    </organization>
    <organization>
        <orgID>1512</orgID>
        <orgName>This is D</orgName>
    </organization>
</organizations>"""

tree = ET.fromstring(data)
print tree.xpath('//organization[descendant::text()="1512"]/orgName/text()')

打印:

['This is D']

答案 1 :(得分:1)

如果我将问题中提供的内容用作xmldata,则会打印如下内容:

<Element orgID at 0x2858c18>

也许您应该检查网址是否真的为您提供了内容。

BTw,如果您要打印orgName的文字,请更改for语句,如下所示:

for target in root.xpath('.//organization/orgID[text()="152"]/following-sibling::orgName/text()'):
    print target