Question

我是这段代码：

        <td colspan="2" class="fc_blabla">
        <a title="Blabla" href="http://www.blabla.com/.html">Blabla</a>
    </td>

我只需要检索链接，我尝试了许多方式：

1＃

for link in soup.find_all("td", { "class":"fc_blabla"}):
   url = link.find("href")
   print link

2＃

print soup.select(".fc_blabla > href")

3＃

for link in soup.find_all("a"):
   url = link.get("href")
   print url

Answer 1

您在“td”标签中寻找“href”。它位于“a”标签中。

import bs4
soup = bs4.BeautifulSoup(raw_html)
td = soup.find('td', {"class":"fc_blabla"})
print td.find("a")["href"]

Answer 2

html="""
  <td colspan="2" class="fc_blabla">
        <a title="Blabla" href="http://www.blabla.com/.html">Blabla</a>
    </td>
    """

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

print(soup.find("td",attrs={"class":"fc_blabla"}).a["href"])

http://www.blabla.com/.html

使用BeautifulSoup获取html内的链接

2 个答案: