I want to loop with BeautifulSoup over all <a href=...>
that are included in a <h2>
, themselves in a <div class="myclass">
:
<a href="www.example.com">Not selected</a>
<div class="myclass">
<a href="www.example.com">Not selected</a>
<h2>
<a href="www.example.com">SELECTED!</a>
</h2>
</div>
I was thinking about something like this, but I can imagine that BeautifulSoup can do such filtering without any if link.parent == ...
tests:
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen(req), "lxml")
for link in soup.select('a[href]'):
if link.parent == ... # tests
print link
How to do this with BeautifulSoup?
答案 0 :(得分:2)
您可以一步一步地findAll
前进到您想要的a
:
for div in soup.findAll("div", attrs={"class": "myclass"}):
for h2 in div.findAll("h2"):
for a in h2.findAll("a"):
print a
或者您可以在select
中使用css选择器:
soup.select('.myclass h2 a')
答案 1 :(得分:1)
使用css选择器:
soup.select('div h2 a')
答案 2 :(得分:0)
Beautiful soup支持CSS类选择器relevant documentation
所以你可以按照以下方式进行查询:
soup.find_all('.myclass > h2 > a')
因此,所有锚标签都是标题的子节点,它们是div的子节点。
答案 3 :(得分:0)
你可以这样做
divs = soup.findAll('div', {'class': 'myclass'})
for div in divs:
links = div.findAll('h2 > a')
for link in links:
print link