Question

我正在尝试打开一个页面/链接并捕获其中的内容。它有时会给我所需的内容并且有时会抛出错误。我看到如果我刷新页面几次 - 我得到了内容。

所以，我想重新加载页面并抓住它。

这是我的伪代码：

attempts = 0
while attempts:
    try:
        open_page = urllib2.Request(www.xyz.com)
        # Or I think we can also do urllib2.urlopen(www.xyz.com)
        break
    except: 
        # here I want to refresh/reload the page
        attempts += 1

我的问题是：
1.如何使用urllib或urllib2重新加载页面或请求或机械化？
我们可以循环尝试捕获吗？

谢谢！

Answer 1

如果在尝试等于0时执行while attempts，则永远不会启动循环。我会向后执行此操作，将attempts初始化为您所需的重新加载次数：

attempts = 10
while attempts:
    try:
        open_page = urllib2.Request('www.xyz.com')
    except: 
        attempts -= 1
    else:
        attempts = False

Answer 2

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

attempts = 10

retries = Retry(total=attempts,
            backoff_factor=0.1,
            status_forcelist=[ 500, 502, 503, 504 ])

sess = requests.Session()
sess.mount('http://', HTTPAdapter(max_retries=retries ))
sess.mount('https://', HTTPAdapter(max_retries=retries))
sess.get('http://www.google.co.nz/')

Answer 3

在引发一些异常或http响应状态代码不是200之后，follow函数可以刷新。

def retrieve(url):
    while 1:
        try:
            response = requests.get(url)
            if response.ok:
                return response
            else:
                print(response.status)
                time.sleep(3)
                continue
        except:
            print(traceback.format_exc())
            time.sleep(3)
            continue

我们可以使用urllib或urllib2或请求或机械化在python中重新加载页面/ URL吗？

3 个答案: