Question

我有以下代码：

from urllib.request import urlopen
from urllib.error import HTTPError, URLError
from bs4 import BeautifulSoup

# target = "https://www.rolcruise.co.uk/cruise-detail/1158731-hawaii-round-trip-honolulu-2020-05-23"
target = "https://www.rolcruise.co.uk"

try:
    html = urlopen(target)
except HTTPError as e:
    print("You got a HTTP Error. Something wrong with the path.")
    print("Here is the error code: " + str(e.code))
    print("Here is the error reason: " + e.reason)
    print("Happy for the program to end here"
except URLError as e:
    print("You got a URL Error. Something wrong with the URL.")
    print("Here is the error reason: " + str(e.reason))
    print("Happy for the program to end here")
else:
    bs_obj = BeautifulSoup(html, features="lxml")
    print(bs_obj)

如果我故意在键入url的某些部分时犯了一个错误，则urlerror处理工作正常，即，如果我故意键入“ htps”而不是“ https”或“ ww”而不是“ www”或“ u”而不是“ uk”。例如

target = "https://www.rolcruise.co.u"

但是，如果键入主机名（“ rolcruise”）或url的“ co”部分时出错，则urlerror不起作用，并且我收到一条错误消息，提示ssl.CertificateError。例如

target = "https://www.rolcruise.c.uk"

我不明白为什么URLError不能覆盖所有网址中出现错字的情况？
鉴于发生了这种情况，下一步该怎么处理ssl.CertificateError？

感谢您的帮助！

Answer 1

将ssl进入您的命名空间以开始：

import ssl

然后您可以捕获这种异常：

try:
    html = urlopen(target)
except HTTPError as e:
    print("You got a HTTP Error. Something wrong with the path.")
    print("Here is the error code: " + str(e.code))
    print("Here is the error reason: " + e.reason)
    print("Happy for the program to end here"
except URLError as e:
    print("You got a URL Error. Something wrong with the URL.")
    print("Here is the error reason: " + str(e.reason))
    print("Happy for the program to end here")
except ssl.CertificateError:
     # Do your stuff here...
else:
    bs_obj = BeautifulSoup(html, features="lxml")
    print(bs_obj)

urlerror和ssl.CertificateError

1 个答案: