无法解析XML响应并获取元素

时间:2016-10-07 08:45:26

标签: python xml python-3.x xml-parsing httprequest

这是我对http请求

的XML响应
<?xml version="1.0" encoding="UTF-8"?>
<Dataset name="aggregations/g/ds083.2/2/TP"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://xml.opendap.org/ns/DAP2"
 xsi:schemaLocation="http://xml.opendap.org/ns/DAP2          
http://xml.opendap.org/dap/dap2.xsd" >

    <Attribute name="NC_GLOBAL" type="Container">
        <Attribute name="Originating_or_generating_Center" type="String">
            <value>US National Weather Service, National Centres for Environmental Prediction (NCEP)</value>
        </Attribute>
        <Attribute name="Originating_or_generating_Subcenter" type="String">
            <value>0</value>
        </Attribute>
        <Attribute name="GRIB_table_version" type="String">
            <value>2,1</value>
        </Attribute>
        <Attribute name="Type_of_generating_process" type="String">
            <value>Forecast</value>
        </Attribute>
        <Attribute name="Analysis_or_forecast_generating_process_identifier_defined_by_originating_centre" type="String">
            <value>Analysis from GDAS (Global Data Assimilation System)</value>
        </Attribute>
        <Attribute name="file_format" type="String">
            <value>GRIB-2</value>
        </Attribute>
        <Attribute name="Conventions" type="String">
            <value>CF-1.6</value>
        </Attribute>
        <Attribute name="history" type="String">
            <value>Read using CDM IOSP GribCollection v3</value>
        </Attribute>
        <Attribute name="featureType" type="String">
            <value>GRID</value>
        </Attribute>
        <Attribute name="_CoordSysBuilder" type="String">
            <value>ucar.nc2.dataset.conv.CF1Convention</value>
        </Attribute>
    </Attribute>

    <Array name="time1">
        <Attribute name="units" type="String">
            <value>Hour since 2007-12-06T12:00:00Z</value>
        </Attribute>
        <Attribute name="standard_name" type="String">
            <value>time</value>
        </Attribute>
        <Attribute name="long_name" type="String">
            <value>GRIB forecast or observation time</value>
        </Attribute>
        <Attribute name="calendar" type="String">
            <value>proleptic_gregorian</value>
        </Attribute>
        <Attribute name="_CoordinateAxisType" type="String">
            <value>Time</value>
        </Attribute>
        <Float64/>
        <dimension name="time1" size="10380"/>
    </Array>

</Dataset>

我正在尝试使用Python 3.5解析此XML内容

from xml.etree import ElementTree

response = requests.get("http://rda.ucar.edu/thredds/dodsC/aggregations/g/ds083.2/2/TP.ddx?time1")

tree = ElementTree.fromstring(response.content)

attr = tree.find("Attribute")
print(attr)

当我打印这个时,我得到一个None。我究竟做错了什么?我还想访问“Array”标签,但也返回None

2 个答案:

答案 0 :(得分:2)

正如the doc中所述,由于数据集根标记的xmlns="http://xml.opendap.org/ns/DAP2"属性,您要查找的所有标记名称都必须以{http://xml.opendap.org/ns/DAP2}为前缀。

# should find something
tree.find("{http://xml.opendap.org/ns/DAP2}Attribute")

阅读ElementTree文档的这一部分还将向您展示如何使用命名空间的名称来使其更具可读性。

答案 1 :(得分:1)

XML文档使用命名空间,因此您需要在代码中支持它。 etree documentation中有一个解释和示例代码。

基本上你可以这样做:

import requests
from xml.etree import ElementTree

response = requests.get('http://rda.ucar.edu/thredds/dodsC/aggregations/g/ds083.2/2/TP.ddx?time1')

tree = ElementTree.fromstring(response.content)

attr = tree.find("{http://xml.opendap.org/ns/DAP2}Attribute")

>>> print(attr)
<Element '{http://xml.opendap.org/ns/DAP2}Attribute' at 0x7f147a292458>

# or declare the namespace like this
ns = {'dap2': 'http://xml.opendap.org/ns/DAP2'}
attr = tree.find("dap2:Attribute", ns)

>>> print(attr)
<Element '{http://xml.opendap.org/ns/DAP2}Attribute' at 0x7f147a292458>