如何使用python获取404错误页面的HTML内容?

时间:2018-11-04 01:42:32

标签: python python-3.x exception web-scraping beautifulsoup

我正在使用python从URL的多个页面获取HTML数据。我发现当URL不存在时urllib会引发异常。如何检索该自定义404错误页面(该页面显示类似“找不到页面”之类的页面)的HTML。

当前代码:

try:
    req = Request(URL, headers={'User-Agent': 'Mozilla/5.0'})
    client = urlopen(req)

    #downloading html data
    page_html = client.read()

    #closing connection
    client.close()
except:
    print("The following URL was not found. Program terminated.\n" + URL)
    break

1 个答案:

答案 0 :(得分:1)

您是否尝试过requests库?

只需使用pip安装库

pip install requests

并像这样使用它

import requests

response = requests.get('https://stackoverflow.com/nonexistent_path')
print(response.status_code) # 404
print(response.text) # Prints the raw HTML response