Comment out and uncomment an xml element

时间:2015-10-29 15:49:02

标签: python xml comments elementtree

I have an xml file and I want to uncomment and comment out an element in the file.

<my_element>
    <blablabla href="docs/MyBlank.htm" />
</my_element>

This one I would like to "close" (comment out) like this:

<!--
<my_element>
    <blablabla href="docs/MyBlank.htm" />
</my_element>
-->

Furter down in the file I have an element with the same name which is "closed" (commented out) like this:

<!--
<my_element>
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
-->

and I want to "open" it up (uncomment) like:

<my_element>
     <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>

I use ElementTree for this, I know how to edit the value and the attribute in the element, but I am not at all sure how to remove and add the <!-- --> around one specific element.

1 个答案:

答案 0 :(得分:1)

您可以使用BeautifulSoup进行解析。基本示例:

xmlbody = '<stuff>\
<my_element>\
    <blablabla href="docs/MyBlank.htm" />\
</my_element>\
<!--\
<my_element>\
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />\
</my_element>\
-->\
</stuff>'

from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(xmlbody, "lxml")

# Find all comments
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
  # Create new soup object from comment contents
  commentsoup = BeautifulSoup(comment, "lxml")
  # Find the tag we want
  blatag = commentsoup.find('blablabla')
  # Check if it is the one we need
  if(blatag['href']=="secretwebhacking/MySecrectBankLogin.htm"):
    # If so, insert the element within the comment into the document
    comment.insert_after(commentsoup.find('body').find('my_element'))
    # And remove the comment
    comment.extract()

# Find all my_elements
my_elements = soup.findAll('my_element')
for tag in my_elements:
  # Check if it's the one we want
  if(tag.find('blablabla')['href'] == "docs/MyBlank.htm"):
    # If so, insert a commented version
    tagcomment = soup.new_string(str(tag), Comment)
    tag.insert_after(tagcomment)
    # And remove the tag
    tag.extract()

print(soup.find('html').find('body').prettify().replace("<body>\n","").replace("\n</body>",""))

那应该让你开始,你可以根据需要使它变得复杂。输出是这样的:

  <stuff>
   <!--<my_element> <blablabla href="docs/MyBlank.htm"></blablabla></my_element>-->
   <my_element>
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm">
    </blablabla>
   </my_element>
  </stuff>