Question

我尝试从以下站点下载所有报告：https://www.opec.org/opec_web/en/publications/4814.htm 但我无法自动找到漂亮的汤和要求的链接。有人可以帮我吗？

到目前为止，我已经尝试了以下代码：

from bs4 import BeautifulSoup

from urllib.request import Request, urlopen
import re

req = Request("https://www.opec.org/opec_web/static_files_project/media")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "lxml")

links = []

for link in soup.findAll('a'):

    print(link.get('href'))

Answer 1

您的代码应类似于

如果它是html文档，则应使用“ html.parser”，并应链接到请求中的正确网址。

from bs4 import BeautifulSoup

from urllib.request import Request, urlopen
import re

req = Request("https://www.opec.org/opec_web/en/publications/4814.htm")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "html.parser")

links = []

for link in soup.findAll('a'):
    href = link.get('href')
    if 'pdf' in href:
        print(href)

无法从网络获取下载链接

1 个答案: