Question

所以，我在几天前学会了Web Scraping的工作原理，今天我一直在搞乱。我想知道如何测试页面是否存在/不存在。所以，我查了一下，发现Python check if website exists。我正在使用requests module，我从答案中获得了此代码：

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist')

我试了一下，因为example.com存在，它打印了＃34;网站存在＆＃34;。但是，我尝试了一些我确定不会存在的东西，比如examplewwwwwww.com，它给了我this error。为什么要这样做？如何防止它打印错误（而是说网站不存在）？

Answer 1

您可以像这样使用try / except：

import requests
from requests.exceptions import ConnectionError

try:
    request = requests.get('http://www.example.com')
except ConnectionError:
    print('Web site does not exist')
else:
    print('Web site exists')

Answer 2

您必须使用request.get附上try/except来电，并处理可能出现的各种异常，其中一个是ConnectionError。

你得到这个是因为响应status_code不等于200并且无法连接到所需的HTTP地址是两回事。

Here是您在使用requests库发出请求时可能遇到的例外情况。

Answer 3

您收到错误是因为您要获取的网址无效，但是您可以使用try - except块轻松检查此错误：

import requests
from requests.exceptions import MissingSchema

try:
    request = requests.get('examplewwwwwww.com')
except MissingSchema:
    print('The provided URL is invalid.')

Answer 4

仅列出我的做法，也许对某人有价值：

  try:
     response = requests.get('https://github.com')
     if response.ok:
        ready = 1
        break
  except requests.exceptions.RequestException:
     print("Website not availabe...")

检查网站是否存在请求是否正常工作

4 个答案: