BeautifulSoup:获取具有特定属性的元素,与其值无关

时间:2014-05-07 15:45:24

标签: python parsing xpath html-parsing beautifulsoup

想象一下,我有以下html:

<div id='0'>
    stuff here
</div>

<div id='1'>
    stuff here
</div>

<div id='2'>
    stuff here
</div>

<div id='3'>
    stuff here
</div>

是否有一种简单的方法可以使用BeautifulSoup提取具有属性div的所有id,而与其值无关?我意识到用xpath做这件事是微不足道的,但似乎没有办法在BeautifulSoup中进行xpath搜索。

2 个答案:

答案 0 :(得分:5)

使用id=True仅匹配具有属性集的元素:

soup.find_all('div', id=True)

反过来也有效;您可以使用id属性排除标记:

soup.find_all('div', id=False):

要查找具有给定属性的标记,您还可以使用CSS selectors

soup.select('div[id]'):

但不幸的是,这不支持搜索逆转所需的运算符。

演示:

>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <div id="id1">This has an id</div>
... <div>This has none</div>
... <div id="id2">This one has an id too</div>
... <div>But this one has no clue (or id)</div>
... '''
>>> soup = BeautifulSoup(sample)
>>> soup.find_all('div', id=True)
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]
>>> soup.find_all('div', id=False)
[<div>This has none</div>, <div>But this one has no clue (or id)</div>]
>>> soup.select('div[id]')
[<div id="id1">This has an id</div>, <div id="id2">This one has an id too</div>]

答案 1 :(得分:1)

BeautifulSoup4支持commonly-used css selectors

>>> import bs4
>>>
>>> soup = bs4.BeautifulSoup('''
... <div id="0"> this </div>
... <div> not this </div>
... <div id="2"> this too </div>
... ''')
>>> soup.select('div[id]')
[<div id="0"> this </div>, <div id="2"> this too </div>]