Question

我有汤，内容如下

很多div，那些我感兴趣的人都有“foo”类

在每个div中，有很多链接和其他内容，我对第二个链接感兴趣（第二个<a> </a>）=＆gt;它始终是第二个我想获取链接（在href属性中）和第二个链接标记<a> </a>

之间的文本

例如：

<div class ="foo">
     <a href ="http://example.com"> </a>
     <a href ="http://example2.com"> Title here </a>
</div>

<div class ="foo">
     <a href ="http://example3.com"> </a>
     <a href ="http://example4.com"> Title 2 here </a>
</div>

我希望得到：

Title here =＆gt; http://example2.com

标题2此处=＆gt; http://example4.com

我尝试过写一些代码：

soup.findAll("div", { "class" : "foo" })

但是返回包含所有div及其内容的列表，我不知道如何进一步

谢谢：）

Answer 1

迭代div并在那里找到a。

from bs4 import BeautifulSoup

example = '''
<div class ="foo">
     <a href ="http://example.com"> </a>
     <a href ="http://example2.com"> Title here </a>
</div>

<div class ="foo">
     <a href ="http://example3.com"> </a>
     <a href ="http://example4.com"> Title 2 here </a>
'''

soup = BeautifulSoup(example)
for div in soup.findAll('div', {'class': 'foo'}):
    a = div.findAll('a')[1]
    print a.text.strip(), '=>', a.attrs['href']

beautifulsoup - 在div中提取链接

1 个答案: