如何从HTML </li>中删除<li>标记的内容

时间:2014-01-28 11:56:55

标签: python beautifulsoup

我正在尝试删除&lt; li>标记

的内容

我的HTML

    <ul id="MenuGreyBar">                        
      <li style="left: 0px;">
        <a href="#" class="bgGrey">&nbsp;</a>
      </li>
    </ul>

  <ul>
    <li>
      <a href="about_us.html" class="bgLightBlue">About Us</a>
    </li>
    <li >
      <a href="Help_Support.html" class="bgMuddyGreen">Help & Support</a>
    </li>
    <li >
      <a href="Law_Info.html" class="bgGreen">Law & Information</a>
    </li>
    <!-- ... There are a few more. -->
  </ul>

我需要删除<li>标记

中的所有内容

我得到的代码

1 个答案:

答案 0 :(得分:3)

你是以错误的方式去做的;只需搜索li代码并在其上调用.decompose()

soup = BeautifulSoup(input_document)
for li in soup.find_all('li'):
    li.decompose()

演示:

>>> from bs4 import BeautifulSoup
>>> input_document = '''\
...     <ul id="MenuGreyBar">                        
...       <li style="left: 0px;">
...         <a href="#" class="bgGrey">&nbsp;</a>
...       </li>
...     </ul>
... 
...   <ul>
...     <li>
...       <a href="about_us.html" class="bgLightBlue">About Us</a>
...     </li>
...     <li >
...       <a href="Help_Support.html" class="bgMuddyGreen">Help & Support</a>
...     </li>
...     <li >
...       <a href="Law_Info.html" class="bgGreen">Law & Information</a>
...     </li>
...     <!-- ... There are a few more. -->
...   </ul>
... '''
>>> soup = BeautifulSoup(input_document)
>>> for li in soup.find_all('li'):
...     li.decompose()
... 
>>> print soup
<html><head></head><body><ul id="MenuGreyBar">                        

    </ul>

  <ul>



    <!-- ... There are a few more. -->
  </ul>
</body></html>