Python:使用urllib2下载网页时出现间歇性错误

时间:2013-12-18 14:08:08

标签: python urllib2

我有一个网页报废程序,每小时下载几次网页。我得到了15或20次尝试中的大约一次:

[Errno 10054] An existing connection was forcibly closed by the remote host

[Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

是否有更好的方法:

def get_page(url):
    def get_page_once(url):
        try:
            page = opener.open(url).read()
        except Exception as e:
            print('Failed to download %s: %s' % (url,e))
            page = ''
        return page

    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0')]

    page = get_page_once(url)
    if page == '':
        time.sleep(2)
        page = get_page_once(url)

    return page

我可以进行多次重试,但我担心在此功能上花费太多时间。

0 个答案:

没有答案