从解析的XML树中删除元素会中断迭代

时间:2016-06-08 12:05:46

标签: python xml

我想解析一个xml文件,然后通过删除所选元素来处理结果树。我的问题是删除一个元素会破坏迭代元素的循环。

考虑以下xml数据:

<results>
    <group>
        <a />
        <b />
        <c />
    </group>
</results>

和代码:

import xml.etree.ElementTree as ET

def showGroup(group,s):
    print(s + '  len=' + str(len(group)))
    print('<group>' )
    for e in group:
        print('   <' + e.tag + '>')
    print('</group>\n')

def processGroup(group):
    for e in group:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

tree = ET.parse('x.xml')
root = tree.getroot()

for group in root:
    processGroup(group)

我希望for循环按顺序处理元素<a><b><c>。特别是:

  1. 处理<a>不应删除任何元素
  2. 处理<b>应删除<b>
  3. 处理<c>应删除<c>
  4. 我希望生成的树在<group><a>元素)中有一个元素,而len(group)将返回1.

    相反,在处理<b>之后,for循环决定已满足结束测试,并且它不处理元素<c>。如果是,则<c>将被删除。相反,我留下了一个元素为<a><c>的树,len(group)返回2.

    在删除所选元素时,我需要做什么来处理所有三个元素? PS:欢迎任何关于风格的评论或更好的做事方式。

    更新:如果在删除元素后没有代码,那么丑陋的黑客会以某种效率为代价“修复”问题。但在我的真实程序中,修剪循环后会有很多代码。

    for e in group:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')
            processGroup(group)
    

    我假设如果for循环被中断,那么在开始时再次使用该组可能会解决问题。递归是一种整洁的方式 - 以重新处理已经检查但未被删除的所有元素为代价。

    我对此解决方案不满意。

1 个答案:

答案 0 :(得分:1)

问题是你正在从正在迭代的东西中删除元素,当你删除一个元素时,剩下的元素会被移位,所以你最终可能会删除不正确的元素:

一个简单的解决方案是迭代树的副本或使用颠倒

副本:

<head>
  <base href="https://polygit.org/polymer+1.5.0/components/">
  <script src="webcomponentsjs/webcomponents-lite.min.js"></script>
  <link rel="import" href="paper-dropdown-menu/paper-dropdown-menu.html">
  <link rel="import" href="paper-menu/paper-menu.html">
  <link rel="import" href="paper-item/paper-item.html">
  <link rel="import" href="paper-input/paper-input.html">
  <link rel="import" href="paper-button/paper-button.html">
  <link rel="import" href="iron-form/iron-form.html"> 
</head>
<body>
  <rsvp-form></rsvp-form>

  <dom-module id="rsvp-form">
    <template>
      <form is="iron-form" id="rsvp" method="post" action="/api/rsvps">
        <h2 class="page-title">RSVP</h2>
        <div class='layout horizontal wrap'>
          <paper-input label='First Name' class='flex' value='{{firstName}}' name="firstName" required></paper-input>
          <paper-input label='Last Name' class='flex' value='{{lastName}}' name="lastName" required></paper-input>
        </div>
        <div class='layout horizontal flex'>
          <paper-dropdown-menu label="Attendance" class='flex' name="attendance" required>
            <paper-menu class="dropdown-content" selected='{{selectedIndex}}'>
              <paper-item>I would love to attend!</paper-item>
              <paper-item>I cannot attend.</paper-item>
            </paper-menu>
          </paper-dropdown-menu>
        </div>
        <paper-button id="submitButton" on-tap="submitRsvp" raised>Submit</paper-button>
      </form>
    </template>
    <script>
      HTMLImports.whenReady(function() {
        Polymer({
          is: 'rsvp-form',

          properties: {
            selectedIndex: {
              type: Number,
              observer: '_selectedIndexChanged'
            },
            firstName: {
              type: String,
              value: ''
            },
            lastName: {
              type: String,
              value: ''
            },
            attendance: {
              type: String,
              value: ''
            }
          },

          listeners: {
            'rsvp.iron-form-presubmit': '_presubmit',
            'rsvp.iron-form-submit': '_submit',
            'rsvp.iron-form-error': '_error',
            'rsvp.iron-form-invalid': '_invalid',
          },

          _selectedIndexChanged: function(newIndex) {
            if (newIndex === 0) {
              this.absent = false;
            } else if (newIndex === 1) {
              this.absent = true;
            }
            this.attending = !this.absent;
          },

          submitRsvp: function(e) {
            this.$.rsvp.submit();
          },

          _presubmit: function() {
            // you could modify data here before it's sent
            console.log('presubmit request', this.$.rsvp.request);
          },
          _submit: function() {
            // data successfully submitted
            console.log('submitted request', this.$.rsvp.request);
          },
          _error: function(e) {
            // data failed to submit
            console.log('submitted failed', this.$.rsvp.request, e.detail);
          },
          _invalid: function() {
            // form input is invalid
            console.log('input invalid (not submitted)');
          }
          
        });
      });
    </script>
  </dom-module>
</body>

颠倒:

 def processGroup(group):
    # creates a shallow copy so we are removing from the original
    # but iterating over a copy. 
    for e in group[:]:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

使用复制逻辑:

def processGroup(group):
    # starts at the end, as the container shrinks.
    # when an element is removed, we still see
    # elements at the same position when we started out loop.
    for e in reversed(group):
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

您也可以使用In [7]: tree = ET.parse('test.xml') In [8]: root = tree.getroot() In [9]: for group in root: ...: processGroup(group) ...: removed <b> len=2 <group> <a> <c> </group> removed <c> len=1 <group> <a> </group> 代替for循环:

ET.tostring