如何忽略xml文件中的特定标记?

时间:2016-04-13 01:02:49

标签: python xml

在我的xml文件中的某个时刻,我有

的贡献者
<revision>
      <id>1</id>
      <timestamp>2012-10-25T15:50:18Z</timestamp>
      <contributor>
        <ip>127.0.0.1</ip>
      </contributor>
</revision>

我的xml文件中的另一个点有

的贡献者
<revision>
      <id>2</id>
      <parentid>1</parentid>
      <timestamp>2012-10-26T20:13:56Z</timestamp>
      <contributor>
        <username>Reedy</username>
        <id>2</id>
      </contributor>
</revision>

我编写了一个python脚本,它将解析xml文件并将我们需要的任何标记返回到输出文件中。但在我的贡献者下,我有两个不同的东西Ip和用户名,id。我想忽略Ip,只想在我的输出文件中写入用户名和id。如果两者都有,我收到KeyError错误:'username'

这是我的代码

import xmltodict
with open('path to xml file') as xml_file:
  dic_xml = xmltodict.parse(xml_file.read())
  page = dic_xml['mediawiki']['page']
  data = list()
  for rev in page['revision']:
      my_string = ""
      my_string += " " + "username:" + dict(rev['contributor'])['username']
      my_string += " " + "userid:" + dict(rev['contributor'])['id']
      my_string += "\n"
      data.append(my_string)

with open('output', 'w') as writingFile:
    for i in data:
        writingFile.write(i)

1 个答案:

答案 0 :(得分:1)

只需使用内置的Python xml element tree module,特别是带有标记和文本属性的dom对象,您可以按标记名称进行条件化:

第一个贡献者类型:

import xml.etree.ElementTree as etree

xmlfile = '''\
<revision>
      <id>1</id>
      <timestamp>2012-10-25T15:50:18Z</timestamp>
      <contributor>
        <ip>127.0.0.1</ip>
      </contributor>
</revision>'''

dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')

with open('output', 'w') as writingFile:
    for items in data:
        if items.tag != 'ip':
            writingFile.write(items.tag + ': ' + items.text + '\n')
# <NOTHING>

第二个贡献者类型:

xmlfile = '''\
<revision>
      <id>2</id>
      <parentid>1</parentid>
      <timestamp>2012-10-26T20:13:56Z</timestamp>
      <contributor>
        <username>Reedy</username>
        <id>2</id>
      </contributor>
</revision>'''

dom = etree.fromstring(xmlfile)
data = dom.findall('contributor/*')

with open('output', 'w') as writingFile:
    for items in data:
        if items.tag != 'ip':
            writingFile.write(items.tag + ': ' + items.text + '\n')
# username: Reedy
# id: 2