使用BeautifulSoup4,我可以使用以下选项选择所有需要的元素:
elements = soup.find_all('a', {'class': 'some-class'})
如何限制elements
仅包含类some-class
但没有href="#"
等属性的锚链接?
答案 0 :(得分:1)
使用href
指定None
:
>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('''
... <div>
... <a class="some-class" href="#">11</a>
... <a class="some-class">22</a>
... <a class="some-class">33</a>
... <a class="some-class" href="#">44</a>
... </div>
... ''')
>>> soup.find_all('a', {'class': 'some-class'})
[<a class="some-class" href="#">11</a>, <a class="some-class">22</a>,
<a class="some-class">33</a>, <a class="some-class" href="#">44</a>]
>>> soup.find_all('a', {'class': 'some-class', 'href': None}) # <--
[<a class="some-class">22</a>, <a class="some-class">33</a>]