Question

我正在使用BeautifulSoup编写一个python程序，它将检索网站上的下载链接。我正在使用find方法来检索链接所在的html类，但是它返回None。

我尝试使用父类访问此类，但未成功。

这是我的代码

link = 'https://data.worldbank.org/topic/agriculture-and-rural-development?view=chart'

for link in indicator_links:
    indicator_page = requests.get(link)
    indicator_soup = BeautifulSoup(page.text, 'html.parser')
    download = indicator_soup.find(class_="btn-item download")

同样，我希望下载链接位于btn-item download html类中。

Answer 1

如果您想要一个链接，它将在标记中为100％。这是我能提供帮助的最好方法：

from bs4 import BeautifulSoup
import urllib.request

page_url = "https://data.worldbank.org/topic/agriculture-and-rural-development?view=chart"
soup = BeautifulSoup(urllib.request.urlopen(page_url), 'lxml')

what_you_want = soup.find('a', clas_="btn-item download")

这应该为您提供所需的链接。

由于我不知道什么是indicator_links，因此不确定您要在代码中尝试做什么。

Answer 2

您是指btn-item download html类中的所有链接吗？

使用此代码更改代码：

link = 'https://data.worldbank.org/topic/agriculture-and-rural-development?view=chart'

page = requests.get(link)
indicator_soup = BeautifulSoup(page.text, 'html.parser')
download = indicator_soup.find(class_="btn-item download")
for lnk in download.find_all('a', href=True):
    print(lnk['href'])

Answer 3

问题是我正在使用错误的html参数创建BeautifulSoup对象。应该是：

indicator_soup = BeautifulSoup(indicator_page.text, 'html.parser')

代替

indicator_soup = BeautifulSoup(page.text, 'html.parser')

BeautifulSoup类发现返回无

3 个答案: