为什么xml.dom.minidom步行删除空文本节点意外工作?

时间:2015-03-25 06:53:31

标签: python xml tree-traversal minidom

守则:(通常深度优先)

import xml.dom.minidom as xdom    

def _walk_n_apply(func, cond, parent):                                                                                                                                                          
    if parent.childNodes:                                                                                                                                                                       
       for child in parent.childNodes:                                                                                                                                                          
            if cond(child):                                                                                                                                                                    
                func(parent, child)                                                                                                                                                              
                continue                                                                                                                                                                      
            _walk_n_apply(func, cond, child)                                                                                                                                                   

def remove_child(parent, child):                                                                                                                                                                 
    node = parent.removeChild(child)                                                                                                                                                             
    print 'removed', node                                                                                                                                                                     

def is_empty_text_node(node):                                                                                                                                                                 
    return node.nodeType == node.TEXT_NODE and node.data.strip() == ''   


xmldom = xdom.parse('blah')

_walk_n_apply(remove_child, is_empty_text_node, xmldom)

在Ipython中,在调用

_walk_n_apply(remove_child, is_empty_text_node, xmldom)

一次,输出中有轻微更改:

print xmldom.toprettyxml()

但是,如果我多次调用它,“几个取决于嵌套级别”,它最终会给出一个格式良好的prettyxml

如何通过一次通话实现这一目标?


输入文件内容:

<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en" version="1.0"
         root="command"
         mode="voice"
         tag-format="semantics/1.0">

<rule id="command">
   <one-of>
      <item><ruleref uri="#announcement" /></item>
      <item><ruleref uri="#hello" /></item>
      <item><ruleref uri="#whereis" /></item>
      <item><ruleref uri="#interrupt" /></item>
      <item><ruleref uri="#message" /></item>
      <item><ruleref uri="#logon" /></item>
      <item><ruleref uri="#logoff" /></item>
      <item><ruleref uri="#storecoverage" /></item>
      <item><ruleref uri="#identify" /></item>
      <item><ruleref uri="#near" /></item>
      <item><ruleref uri="#time" /></item>
      <item><ruleref uri="#playmessages" /></item>
      <item><ruleref uri="#registerbackup" /></item>
      <item><ruleref uri="#igotit" /></item>
   </one-of>
   <tag>out=rules.latest()</tag>
</rule>

<rule id="announcement">
<item>
  <one-of>
   <item>announcement today<tag>out="AnnouncementToday"</tag></item>
   <item>announcement now<tag>out="AnnouncementNow"</tag></item>
   <item>announcement hour<tag>out="AnnouncementHour"</tag></item>
  </one-of>
</item>
</rule>

<rule id="hello">
  <item repeat="0-1">
    <one-of>
        <item>hello</item>
        <item>hey</item>
        <item>hi</item>
    </one-of>
   </item>
   <item><ruleref uri="persons.grxml"/><tag>out="Hello,"+rules.latest()</tag></item>
</rule>

<rule id="whereis">
  <item>
   <one-of>
      <item>where is<ruleref uri="persons.grxml"/></item>
      <item>locate<ruleref uri="persons.grxml"/></item>
      <item>find<ruleref uri="persons.grxml"/></item>
   </one-of>
   <tag>out="Whereis,"+rules.latest()</tag>
  </item>
</rule>

<rule id="interrupt">
   <item>interrupt<ruleref uri="persons.grxml"/><tag>out="Interrupt,"+rules.latest()</tag></item>
</rule>

<rule id="message">
<item>message</item>
  <item repeat="0-1">for</item>
   <item><ruleref uri="persons.grxml"/><tag>out="Message,"+rules.latest()</tag></item>
</rule>

<rule id="logon">
   <one-of>
    <item>log on
       <one-of>
        <item><ruleref uri="persons.grxml"/><tag>out="Logon,"+rules.latest()</tag></item>
        <item><ruleref uri="#id_numbers"/><tag>out="Logon,"+rules.latest()</tag></item>
       </one-of>
        </item>
   </one-of>
</rule>

<rule id="logoff">
<item>
  <one-of>
      <item>log off<item repeat='0-1'>system</item></item>
      <item>log out</item>
  </one-of>
  <tag>out="Logoff"</tag>
</item>
</rule>

<rule id="storecoverage">
   <item repeat="0-1">store</item>
    <item>coverage<tag>out="coverage"</tag></item>
</rule>

<rule id="identify">
   <item>identify<tag>out="identify"</tag></item>
</rule>

<rule id="near">
    <one-of>
      <item>who is</item>
      <item>anyone</item>
    </one-of>
  <item>near<ruleref uri="#locations"/><tag>out="near,"+rules.latest()</tag></item>
</rule>

<rule id="time">
<one-of>
    <item>time<tag>out="time"</tag></item>
    <item>what time is it<tag>out="time"</tag></item>
</one-of>
</rule>

<rule id="playmessages">
  <item>
    play
   <one-of>
      <item>messages<tag>out="PlayMessages"</tag></item>
      <item>announcements<tag>out="PlayMessages"</tag></item>
   </one-of>
  </item>
</rule>

<rule id="registerbackup">
   <item repeat="0-1">cash</item>
    <item>register backup<tag>out="register backup"</tag></item>
</rule>

<rule id="igotit">
 <one-of>
   <item>
    <one-of>
     <item>i got it<tag>out="i got it"</tag></item>
     <item>i have it<tag>out="i got it"</tag></item>
    </one-of>
   </item>
   <item>
    <one-of>
     <item>on the way<tag>out="i got it"</tag></item>
     <item>on my way<tag>out="i got it"</tag></item>
    </one-of>
   </item> 
 </one-of>
</rule>


<rule id="locations">
   <ruleref uri="locations.grxml"/>
   <tag>out=rules.latest();</tag>
</rule>

如果我只调用一次该函数,则输出:

removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">

如果我迭代地调用函数10次输出:

喜欢这样

for i in range(10):
    _walk_n_apply(remove_child, is_empty_text_node, xmldom)

(输出是从tmux会话中复制粘贴的,因此可能会遗漏几行;我理解的缺点是,如果我的函数是递归和正确的,它应该已经废除了所有的空文本一次调用中的节点。但是第二次调用它也会导致一些空文本节点被删除,然后是第三次,依此类推......直到没有剩下的空文本节点为止。)

removed <DOM Text node "u'\n\n'">                                                                                                                                                       
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n  '">                                                                                                                                                       
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n    '">                                                                                                                                                     
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u' \n '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">                                                                                                                                                    
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n'">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n       '">
removed <DOM Text node "u'\n        '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n      '">
removed <DOM Text node "u'\n  '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n   '">
removed <DOM Text node "u'\n     '">
removed <DOM Text node "u'\n     '">
removed <DOM Text node "u'\n    '">
removed <DOM Text node "u'\n     '">
removed <DOM Text node "u'\n     '">
removed <DOM Text node "u'\n    '">

1 个答案:

答案 0 :(得分:1)

在迭代.childNodes时,您正在操纵子项列表。试试这个:

for child in list(parent.childNodes):