好吧,伙计们,我是解析XML和Python的新手,我正试图让它发挥作用。如果有人可以帮助我,我将不胜感激。如果你可以帮助我(教育我)如何为自己解决这个问题,那就更好了!
我无法找出XML文档的引用范围,因为我找不到任何文档。这是我的代码,我将在之后包含整个Traceback。
#import library to do http requests:
import urllib.request
#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations
#download the file:
file = urllib.request.urlopen('http://www.wizards.com/dndinsider/compendium/CompendiumSearch.asmx/KeywordSearch?Keywords=healing%20%word&nameOnly=True&tab=')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()
#parse the xml you downloaded
dom = parseString(data)
#retrieve the first xml tag (<tag>data</tag>) that the parser finds with name tagName:
xmlTag = dom.getElementsByTagName('Data.Results.Power.ID')[0].toxml()
#strip off the tag (<tag>data</tag> ---> data):
xmlData=xmlTag.replace('<id>','').replace('</id>','')
#print out the xml tag and data in this format: <tag>data</tag>
print(xmlTag)
#just print the data
print(xmlData)
回溯
/usr/bin/python3.4 /home/mint/PycharmProjects/DnD_Project/Power_Name.py
Traceback (most recent call last):
File "/home/mint/PycharmProjects/DnD_Project/Power_Name.py", line 14, in <module>
xmlTag = dom.getElementsByTagName('id')[0].toxml()
IndexError: list index out of range
使用退出代码1完成处理
答案 0 :(得分:1)
print len( dom.getElementsByTagName('id') )
修改强>
ids = dom.getElementsByTagName('id')
if len( ids ) > 0 :
xmlTag = ids[0].toxml()
# rest of code
编辑:我添加了示例,因为我在其他评论中看到你不知道如何使用它
BTW:我在代码中添加了一些关于文件/连接的注释
import urllib.request
from xml.dom.minidom import parseString
# create connection to data/file on server
connection = urllib.request.urlopen('http://www.wizards.com/dndinsider/compendium/CompendiumSearch.asmx/KeywordSearch?Keywords=healing%20%word&nameOnly=True&tab=')
# read from server as string (not "convert" to string):
data = connection.read()
#close connection because we dont need it anymore:
connection.close()
dom = parseString(data)
# get tags from dom
ids = dom.getElementsByTagName('Data.Results.Power.ID')
# check if there are any data
if len( ids ) > 0 :
xmlTag = ids[0].toxml()
xmlData=xmlTag.replace('<id>','').replace('</id>','')
print(xmlTag)
print(xmlData)
else:
print("Sorry, there was no data")
如果有更多标签,您可以使用for
循环
dom = parseString(data)
# get tags from dom
ids = dom.getElementsByTagName('Data.Results.Power.ID')
# get all tags - one by one
for one_tag in ids:
xmlTag = one_tag.toxml()
xmlData = xmlTag.replace('<id>','').replace('</id>','')
print(xmlTag)
print(xmlData)
BTW:
getElementsByTagName()
需要标记名ID
- 而不是路径Data.Results.Power.ID
ID
,因此您必须替换<ID>
而不是<id>
one_tag.firstChild.nodeValue
代替xmlTag.replace
dom = parseString(data)
# get tags from dom
ids = dom.getElementsByTagName('ID') # tagname
# get all tags - one by one
for one_tag in ids:
xmlTag = one_tag.toxml()
#xmlData = xmlTag.replace('<ID>','').replace('</ID>','')
xmlData = one_tag.firstChild.nodeValue
print(xmlTag)
print(xmlData)
答案 1 :(得分:0)
我有一段时间没有使用内置的xml库,但Mark Pilgrim的精彩Dive into Python书已经介绍了它。
- 我看到我输入的内容已经回答了你的问题,但是既然你提到了Python的新手,我想你会发现这个文本对于xml解析非常有用,并且是对该语言的一个很好的介绍。
如果您想尝试另一种解析xml和html的方法,我强烈推荐lxml。