Question

我有一个XML文件，之前我评论了一些元素，现在我想取消注释它们。

我有这个结构

<parent parId="22" attr="Alpha">
 <!--<reg regId="1">
  <cont>There is some content</cont><cont2 attr1="val">Another content</cont2>
 </reg>
--></parent>
<parent parId="23" attr="Alpha">
 <reg regId="1">
  <cont>There is more content</cont><cont2 attr1="noval">Morecont</cont2>
 </reg>
</parent>
<parent parId="24" attr="Alpha">
 <!--<reg regId="1">
  <cont>There is some content</cont><cont2 attr1="val">Another content</cont2>
 </reg>
--></parent>

我想取消注释该文件的所有注释。因此，也是注释元素，我会取消它们。

我能够使用xpath找到注释的元素。这是我的代码片段。

def unhide_element():
    path = r'path_to_file\file.xml'
    xml_parser = et.parse(path)
    comments = root.xpath('//comment')
    for c in comments:
       print('Comment: ', c)
       parent_comment = c.getparent()
       parent_comment.replace(c,'')
       tree = et.ElementTree(root)
       tree.write(new_file)

但是，替换不起作用，因为它需要另一个元素。

我该如何解决这个问题？

Answer 1

您的代码缺少从评论文本创建新XML元素的关键点。还有一些与错误的XPath查询相关的其他错误，以及在循环内多次保存输出文件。

此外，您似乎正在将xml.etree与lxml.etree混合。根据{{3}}，前者在解析XML文件时忽略注释，因此最好的方法是使用documentation。

在完成上述所有操作后，我们得到了类似的结果。

import lxml.etree as ET


def unhide_element():
    path = r'test.xml'
    root = ET.parse(path)
    comments = root.xpath('//comment()')
    for c in comments:
        print('Comment: ', c)
        parent_comment = c.getparent()
        parent_comment.remove(c)  # skip this if you want to retain the comment
        new_elem = ET.XML(c.text)  # this bit creates the new element from comment text
        parent_comment.addnext(new_elem)

    root.write(r'new_file.xml')

Answer 2

好吧，既然你想要取消注释所有内容，你真正需要做的就是删除每个＆＃34;＆lt; ！ - ＆＃34;和＆＃34; - ＆gt;＆＃34;：

import re

new_xml = ''.join(re.split('<!--|-->', xml))

或者：

new_xml = xml.replace('<!--', '').replace('-->', '')

删除Python 3中的所有注释lxml

2 个答案: