Question

我正在尝试解析来自荷兰NDW的XML文件，该文件包含许多荷兰高速公路上的每一分钟的流量。我使用此示例文件：http://www.ndw.nu/downloaddocument/e838c62446e862f5b6230be485291685/Reistijden.zip

我正在尝试使用Python解析变量中的旅行时数据，但我正在努力。

from xml.etree import ElementTree
import urllib2
url = "http://weburloffile.nl/ndw/Reistijden.xml"
response = urllib2.urlopen(url)
namespaces = {
    'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
    'a': 'http://datex2.eu/schema/2/2_0'
     }
dom = ElementTree.fromstring(response.read)
names = dom.findall(
        'soap:Envelope'
        '/a:duration',
        namespaces,
)
#print names
for duration in names:
    print(duration.text)

我收到了这个新错误

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    dom = ElementTree.fromstring(response.read)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1651, in feed
    self._parser.Parse(data, 0)
TypeError: Parse() argument 1 must be string or read-only buffer, not instancemethod

如何正确解析这个（复杂的）xml？

- 按照评论

的建议将其更改为读取

Answer 1

问题不在于XML解析;这是你错误地使用response对象。 urllib2.urlopen返回一个没有content属性的类文件对象。相反，您应该在其上调用read：

dom = ElementTree.fromstring(response.read())

解析荷兰语NDW xml

1 个答案: