因此,我正在构建一个Python脚本,以使用Requests和BeautifulSoup4从网址中抓取一些数据(世界杯得分),并且在测试我的代码时,我发出的请求数量超过了网站的期望,因此定期出现错误:
requests.exceptions.ConnectionError: Max retries exceeded with url
实际上,我不需要继续调用该页面,当然,我只需要调用一次即可并将返回的数据保存在本地,然后将其输入漂亮的汤中。当然,我不是第一个这样做的人,还有另一种方法吗?这可能微不足道,但是我对此很陌生-谢谢。
这就是我正在使用的东西:
import requests
from bs4 import BeautifulSoup
url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
答案 0 :(得分:1)
将HTML一次存储到文件中
response = requests.get(url)
with open('cache.html', 'wb') as f:
f.write(response.content)
然后,下一次,只需从文件中加载它即可:
with open('cache.html', 'rb') as f:
soup = BeautifulSoup(f.read(), 'html.parser')
答案 1 :(得分:1)
如果出现错误,您可以尝试等待1或2秒:
import requests
from bs4 import BeautifulSoup
url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
try:
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
except:
print("Connection refused by the server..")
print("Let me sleep for 2 seconds")
time.sleep(2)
print("Continue...")
continue
我无法对其进行测试,因此也许无法像这样工作。