Question

我是一个完全的初学者，并且尝试使用Python从URL打开XML文件时遇到一些问题。

这是我的代码（我在网上找到的代码段）：

# import library to do http requests:
from urllib.request import urlopen


#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations

#download the file:
file = urlopen('http://www.odaa.dk/storage/f/2014-04-28T12%3A49%3A26.677Z/lejemaal.xml')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()
#parse the xml you downloaded
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('tagName')[0].toxml()
#strip off the tag (<tag>data</tag>  --->   data):
xmlData = xmlTag.replace('<tagName>', '').replace('</tagName>', '')
#print out the xml tag and data in this format: <tag>data</tag>
print(xmlTag)
#just print the data
print(xmlData)

当我运行时，我收到一条错误消息：

Traceback (most recent call last):
File "/Users/-----/PycharmProjects/First/test.py", line 20, in <module>
xmlTag = dom.getElementsByTagName('tagName')[0].toxml()
IndexError: list index out of range

在板上阅读了类似的主题，似乎我试图访问不存在的东西。或者是因为我复制的片段是＆＃34; tagName＆＃34;？我需要编辑吗？

如何解决我的问题？我甚至不确定我钓鱼的结果是什么，因为我只想尝试让事情发生。希望有人可以指出我正确的方向：）

Answer 1

事实上，您已经完成工作的代码（未经测试）。

问题是你的xml文件中没有名为'tagName'的标签，所以python会返回一个空列表。

然后尝试获取此空列表的第一个元素，从而得到IndexError。

您应该尝试将tagName替换为xml文档中存在的标记名称，例如“row”。

那么你通常知道xml文件中有哪些标签，因为你知道它的结构。您还可以使用python以编程方式检索使用以下代码的列表：

root = dom.documentElement
for node in root.childNodes:
    print(node.tagName)

此代码应该打印文档根元素下所有节点的标记名称（第一个包含所有其他节点）。

Python，从URL打开XML：“列表索引超出范围”

1 个答案: