从requests.exceptions.ConnectionError中恢复的正确方法是什么?

时间:2013-11-19 15:42:52

标签: python web-scraping beautifulsoup python-requests

我正在废弃一个网站,但有时笔记本电脑丢失了连接,我(显然)得到requests.exceptions.ConnectionError。从这个错误中恢复的正确(或最优雅?)方式是什么?我的意思是:我不希望程序停止,但重试连接,也许几秒钟后?这是我的代码,但我感觉不正确:

def make_soup(session,url):
    try:
       n = randint(1, MAX_NAPTIME) 
       sleep(n)
       response = session.get(url)
    except requests.exceptions.ConnectionError as req_ce:
        error_msg = req_ce.args[0].reason.strerror
        print "Error: %s con la url %s" % (eror_msg, url)
        session = logout(session)
        n = randint(MIN_SLEEPTIME, MAX_SLEEPTIME)
        sleep(n)
        session = login(session)
        response = session.get(url)
    soup = BeautifulSoup(response.text)
    return soup

有什么想法吗?

请注意,我需要一个会话来废弃这些页面,因此,我认为login(即在注销后再次登录该网站)可能会导致麻烦

2 个答案:

答案 0 :(得分:4)

那么为什么不喜欢

import requests
import time

def retry(cooloff=5, exc_type=None):
    if not exc_type:
        exc_type = [requests.exceptions.ConnectionError]

    def real_decorator(function):
        def wrapper(*args, **kwargs):
            while True:
                try:
                    return function(*args, **kwargs)
                except Exception as e:
                    if e.__class__ in exc_type:
                        print "failed (?)"
                        time.sleep(cooloff)
                    else:
                        raise e
        return wrapper
    return real_decorator

哪个装饰器允许你调用任何函数,直到它成功为止。例如

@retry(exc_type=[ZeroDivisionError])
def test():
    return 1/0

print test()

只会在每5秒打印一次“失败(y)”,直到时间结束(或直到数学定律发生变化)

答案 1 :(得分:0)

是否真的需要注销并重新登录您的会话?我只是以同样的方式重试连接:

def make_soup(session,url):
    success = False
    response = None
    for attempt in range(1, MAXTRIES):
        try:
            response = session.get(url)
            # If session.get succeeded, we break out of the
            # for loop after setting a success flag
            success = True
            break
        except requests.exceptions.ConnectionError as req_ce:
            error_msg = req_ce.args[0].reason.strerror
            print "Error: %s con la url %s" % (error_msg, url)
            print " Attempt %s of %s" % (attempt, MAXTRIES)
            sleep(randint(MIN_SLEEPTIME, MAX_SLEEPTIME))

    # Figure out if we were successful. 
    # Note it may not be needed to have a flag, you can maybe just
    # check the value of response here.
    if not success:
        print "Couldn't get it after retrying many times"
        return None

    #Once we get here, we know we got a good response
    soup = BeautifulSoup(response.text)
    return soup