在我的xml文件中的某个时刻,我有
的贡献者<revision>
<id>1</id>
<timestamp>2012-10-25T15:50:18Z</timestamp>
<contributor>
<ip>127.0.0.1</ip>
</contributor>
</revision>
我的xml文件中的另一个点有
的贡献者<revision>
<id>2</id>
<parentid>1</parentid>
<timestamp>2012-10-26T20:13:56Z</timestamp>
<contributor>
<username>Reedy</username>
<id>2</id>
</contributor>
</revision>
我编写了一个python脚本,它将解析xml文件并将我们需要的任何标记返回到输出文件中。但在我的贡献者下,我有两个不同的东西Ip和用户名,id。我想忽略Ip,只想在我的输出文件中写入用户名和id。如果两者都有,我收到KeyError错误:'username'
这是我的代码
import xmltodict
with open('path to xml file') as xml_file:
dic_xml = xmltodict.parse(xml_file.read())
page = dic_xml['mediawiki']['page']
data = list()
for rev in page['revision']:
my_string = ""
my_string += " " + "username:" + dict(rev['contributor'])['username']
my_string += " " + "userid:" + dict(rev['contributor'])['id']
my_string += "\n"
data.append(my_string)
with open('output', 'w') as writingFile:
for i in data:
writingFile.write(i)
答案 0 :(得分:1)
只需使用内置的Python xml element tree module,特别是带有标记和文本属性的dom对象,您可以按标记名称进行条件化:
第一个贡献者类型:
import xml.etree.ElementTree as etree
xmlfile = '''\
<revision>
<id>1</id>
<timestamp>2012-10-25T15:50:18Z</timestamp>
<contributor>
<ip>127.0.0.1</ip>
</contributor>
</revision>'''
dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')
with open('output', 'w') as writingFile:
for items in data:
if items.tag != 'ip':
writingFile.write(items.tag + ': ' + items.text + '\n')
# <NOTHING>
第二个贡献者类型:
xmlfile = '''\
<revision>
<id>2</id>
<parentid>1</parentid>
<timestamp>2012-10-26T20:13:56Z</timestamp>
<contributor>
<username>Reedy</username>
<id>2</id>
</contributor>
</revision>'''
dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')
with open('output', 'w') as writingFile:
for items in data:
if items.tag != 'ip':
writingFile.write(items.tag + ': ' + items.text + '\n')
# username: Reedy
# id: 2