无法从网络获取下载链接

时间:2019-03-16 11:53:53

标签: python python-3.x download

我尝试从以下站点下载所有报告:https://www.opec.org/opec_web/en/publications/4814.htm 但我无法自动找到漂亮的汤和要求的链接。有人可以帮我吗?

到目前为止,我已经尝试了以下代码:

from bs4 import BeautifulSoup

from urllib.request import Request, urlopen
import re

req = Request("https://www.opec.org/opec_web/static_files_project/media")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "lxml")

links = []

for link in soup.findAll('a'):

    print(link.get('href'))

1 个答案:

答案 0 :(得分:2)

您的代码应类似于

如果它是html文档,则应使用“ html.parser”,并应链接到请求中的正确网址。

from bs4 import BeautifulSoup

from urllib.request import Request, urlopen
import re

req = Request("https://www.opec.org/opec_web/en/publications/4814.htm")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "html.parser")

links = []

for link in soup.findAll('a'):
    href = link.get('href')
    if 'pdf' in href:
        print(href)

相关问题