将本地的request.get()响应保存在美丽汤中

时间:2018-06-26 10:47:08

标签: python web-scraping beautifulsoup python-requests

因此,我正在构建一个Python脚本,以使用Requests和BeautifulSoup4从网址中抓取一些数据(世界杯得分),并且在测试我的代码时,我发出的请求数量超过了网站的期望,因此定期出现错误:

 requests.exceptions.ConnectionError: Max retries exceeded with url

实际上,我不需要继续调用该页面,当然,我只需要调用一次即可并将返回的数据保存在本地,然后将其输入漂亮的汤中。当然,我不是第一个这样做的人,还有另一种方法吗?这可能微不足道,但是我对此很陌生-谢谢。

这就是我正在使用的东西:

import requests
from bs4 import BeautifulSoup

url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")

2 个答案:

答案 0 :(得分:1)

将HTML一次存储到文件中

response = requests.get(url)
with open('cache.html', 'wb') as f:
    f.write(response.content)

然后,下一次,只需从文件中加载它即可:

with open('cache.html', 'rb') as f:
    soup = BeautifulSoup(f.read(), 'html.parser')

答案 1 :(得分:1)

如果出现错误,您可以尝试等待1或2秒:

import requests
from bs4 import BeautifulSoup

url = "https://www.telegraph.co.uk/world-cup/2018/06/26/world-cup-2018-fixtures-complete-schedule-match-results-far/"
try:
     response = requests.get(url)
     html = response.content
     soup = BeautifulSoup(html, "html.parser")
except:
    print("Connection refused by the server..")
    print("Let me sleep for 2 seconds")
    time.sleep(2)
    print("Continue...")
    continue

我无法对其进行测试,因此也许无法像这样工作。