Question

可能重复：
BeautifulSoup getting href

我正在使用美丽的汤，下面是我的代码

import urllib2
data = urllib2.urlopen("some_url")
html_data = data.read()
soup = BeautifulSoup(html_data)
href_tags = soup.findAll('a')

结果：

href_tags = 
[<a href="http://www.exampl.com/score_card" target="_blank" style="font-family:arial;color:#192e94;">Click Here</a>, 
<a href="https://example.icims.com/jobs/search?pr=5">what is this</a>,
<a href="https://example.com/search?pr=6">Cool</a>,
<a href="https://example.com/help/host/search?pr=7">Hello</a>]

但实际上我想从所有锚标签中获取href，我该如何提取href标签。

提前致谢.........

Answer 1

尝试循环匹配：

import urllib2
data = urllib2.urlopen("some_url")
html_data = data.read()
soup = BeautifulSoup(html_data)

for a in soup.findAll('a',href=True):
    print a['href']

Answer 2

脱离我的头顶 - href_tags = [tag['href'] for tag in soup.findAll('a', {'href': True})]

{'href': True}确保有一个href属性，以便tag.attr['href']不会失败。

如何从漂亮的汤中的锚标签中提取href链接

2 个答案: