Question

我需要从 <а> 标签获取 href 属性，但它不起作用

soup = BeautifulSoup(html, 'html.parser')
a_tags = soup.find_all('a')
print(a_tags[0].p) #print <p> tag
print(a_tags[0].a) #print 'None'
print(a_tags[0].a.get('href')) #doesn't work

但是如果我尝试 print(a_tags) 它会显示它们：

[<a href="/org/colleges/instrcol/Pages/pic1.aspx" style="display:block;" target="_blank">
<div style="min-height:360px;">
<img alt="pic1" src="iblock/6ba/%d0%90%d0%b1%d1%80%d0%b0%d0%bc%d0%be%d0%b2%20%d0%a1%d0%b5%d1%80%d0%b3%d0%b5%d0%b9%20%d0%90%d0%bd%d1%82%d0%be%d0%bd%d0%b8%d0%b4%d0%be%d0%b2%d0%b8%d1%87.jpg"/>
<p>Pic1</p></div>
</a>, <a href="/org/colleges/instrcol/Pages/pic2.aspx" style="display:block;" target="_blank">
<div style="min-height:360px;">
<img alt="pic2" src="iblock/1ee/%d0%90%d0%b3%d0%b0%d1%84%d0%be%d0%bd%d0%be%d0%b2%20%d0%9f%d0%b0%d0%b2%d0%b5%d0%bb%20%d0%92%d0%b8%d1%82%d0%b0%d0%bb%d1%8c%d0%b5%d0%b2%d0%b8%d1%87.jpg"/>
<p>Pic2</p></div>
</a>,
...

是什么导致了这个问题？

Answer 1

您在使用 .gitignore 时忘记添加 href=True 试试这个：

find_all()

Answer 2

a_tags 已经包含 <a>。

将 a_tags[0].a.get('href') 替换为 a_tags[0].get('href')。

使用 BeautifulSoup 获取 <a> 标签时遇到问题

2 个答案: