Question

因此，我正在构建一个Python脚本，以使用Requests和BeautifulSoup4从网址中抓取一些数据（世界杯得分），并且在测试我的代码时，我发出的请求数量超过了网站的期望，因此定期出现错误：

 requests.exceptions.ConnectionError: Max retries exceeded with url

实际上，我不需要继续调用该页面，当然，我只需要调用一次即可并将返回的数据保存在本地，然后将其输入漂亮的汤中。当然，我不是第一个这样做的人，还有另一种方法吗？这可能微不足道，但是我对此很陌生-谢谢。

这就是我正在使用的东西：

import requests
from bs4 import BeautifulSoup

url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")

Answer 1

将HTML一次存储到文件中

response = requests.get(url)
with open('cache.html', 'wb') as f:
    f.write(response.content)

然后，下一次，只需从文件中加载它即可：

with open('cache.html', 'rb') as f:
    soup = BeautifulSoup(f.read(), 'html.parser')

Answer 2

如果出现错误，您可以尝试等待1或2秒：

import requests
from bs4 import BeautifulSoup

url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
try:
     response = requests.get(url)
     html = response.content
     soup = BeautifulSoup(html, "html.parser")
except:
    print("Connection refused by the server..")
    print("Let me sleep for 2 seconds")
    time.sleep(2)
    print("Continue...")
    continue

我无法对其进行测试，因此也许无法像这样工作。

将本地的request.get（）响应保存在美丽汤中

2 个答案: