如何使用Python解析来自欧洲中央银行的XML文件

时间:2013-06-22 12:22:55

标签: python xml python-3.x elementtree

我正在尝试用欧元汇率解析欧洲中央银行的XML文件。 不幸的是,我对解析XML文件感到困惑。当我删除困难部分(与“gesmes”相关的所有内容)时,我没有问题迭代“Cube”元素,但我无法处理xml文件的“gesmes”部分。 我使用了ElementTree API。

示例XML文件:http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
    <gesmes:subject>Reference rates</gesmes:subject>
    <gesmes:Sender>
        <gesmes:name>European Central Bank</gesmes:name>
    </gesmes:Sender>
    <Cube>
        <Cube time='2013-06-21'>
            <Cube currency='USD' rate='1.3180'/>
            <Cube currency='JPY' rate='128.66'/>
            <Cube currency='BGN' rate='1.9558'/>
            <Cube currency='CZK' rate='25.825'/>
            <Cube currency='DKK' rate='7.4582'/>
            <Cube currency='GBP' rate='0.85330'/>
            <Cube currency='HUF' rate='298.87'/>
            <Cube currency='LTL' rate='3.4528'/>
            <Cube currency='LVL' rate='0.7016'/>
            <Cube currency='PLN' rate='4.3289'/>
            <Cube currency='RON' rate='4.5350'/>
            <Cube currency='SEK' rate='8.6927'/>
            <Cube currency='CHF' rate='1.2257'/>
            <Cube currency='NOK' rate='7.9090'/>
            <Cube currency='HRK' rate='7.4905'/>
            <Cube currency='RUB' rate='43.2260'/>
            <Cube currency='TRY' rate='2.5515'/>
            <Cube currency='AUD' rate='1.4296'/>
            <Cube currency='BRL' rate='2.9737'/>
            <Cube currency='CAD' rate='1.3705'/>
            <Cube currency='CNY' rate='8.0832'/>
            <Cube currency='HKD' rate='10.2239'/>
            <Cube currency='IDR' rate='13088.24'/>
            <Cube currency='ILS' rate='4.7891'/>
            <Cube currency='INR' rate='78.1200'/>
            <Cube currency='KRW' rate='1521.52'/>
            <Cube currency='MXN' rate='17.5558'/>
            <Cube currency='MYR' rate='4.2222'/>
            <Cube currency='NZD' rate='1.7004'/>
            <Cube currency='PHP' rate='57.707'/>
            <Cube currency='SGD' rate='1.6790'/>
            <Cube currency='THB' rate='41.003'/>
            <Cube currency='ZAR' rate='13.4906'/>
        </Cube>
    </Cube>
</gesmes:Envelope>

我想要的是搜索特定货币(来自用户输入)并获得回报率,以便我可以使用结果。

2 个答案:

答案 0 :(得分:9)

您有一个命名空间的XML文件。 ElementTree对名称空间并不太聪明。您需要为.find()findall()iterfind()方法提供显式命名空间字典。这没有记录得很好:

namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'} # add more as needed

for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
    print(cube.attrib['currency'], cube.attrib['rate'])

这使用简单的XPath查询; './/'表示查找任何子标记,ex:Cube将搜索限制为标有<Cube>前缀的命名空间中的ex标记(来自namespaces映射) [@currency]将搜索范围限制为具有currency属性的元素。

演示:

>>> import requests
>>> r = requests.get('http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml', stream=True)
>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse(r.raw)
>>> root = tree.getroot()
>>> namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}
>>> for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
...     print(cube.attrib['currency'], cube.attrib['rate'])
... 
USD 1.3180
JPY 128.66
BGN 1.9558
CZK 25.825
DKK 7.4582
GBP 0.85330
HUF 298.87
LTL 3.4528
LVL 0.7016
PLN 4.3289
RON 4.5350
SEK 8.6927
CHF 1.2257
NOK 7.9090
HRK 7.4905
RUB 43.2260
TRY 2.5515
AUD 1.4296
BRL 2.9737
CAD 1.3705
CNY 8.0832
HKD 10.2239
IDR 13088.24
ILS 4.7891
INR 78.1200
KRW 1521.52
MXN 17.5558
MYR 4.2222
NZD 1.7004
PHP 57.707
SGD 1.6790
THB 41.003
ZAR 13.4906

您也可以使用此信息搜索特定费率;要么构建字典,要么直接搜索XML文档以匹配货币:

currency = input('What currency are you looking for? ')
match = root.find('.//ex:Cube[@currency="{}"]'.format(currency.upper()), namespaces=namespaces)
if match is not None:
    print('The rate for {} is {}'.format(currency, match.attrib['rate']))

答案 1 :(得分:0)

你也可以这样做:

from xml.etree import cElementTree as ET
full_file = 'eurofxref-daily.xml'    # has to be full path
tree = ET.ElementTree(file=full_file)
root = tree.getroot()

for child in root:
    for subchild in child:
        for subsubchild in subchild:
            print(subsubchild.attrib['currency'])
            print(subsubchild.attrib['rate'])