在BeautifulSoup中获取带有空id的标签内容

时间:2015-06-18 13:05:13

标签: python beautifulsoup

from bs4 import BeautifulSoup

page = """<span id="something">useless</span>
          <span id="">some text</span>
          <span id="different">useless</span>"""
soup = BeautifulSoup(page)

我如何才能获得some text?使用soup.find_all('span', {'id': ""})查找所有内容。

1 个答案:

答案 0 :(得分:1)

您有两种选择:

  1. 使用自定义过滤器;传入一个函数,系统会要求它返回TrueFalse元素:

    soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
    
  2. 使用具有完全属性匹配的CSS selector

    soup.select('span[id=""]')
    
  3. 演示:

    >>> from bs4 import BeautifulSoup
    >>> page = """<span id="something">useless</span>
    ...           <span id="">some text</span>
    ...           <span id="different">useless</span>"""
    >>> soup = BeautifulSoup(page)
    >>> soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
    [<span id="">some text</span>]
    >>> soup.select('span[id=""]')
    [<span id="">some text</span>]