一段时间后python请求库失败,文件描述符不正确

时间:2018-09-19 01:00:31

标签: python sockets python-requests

我正在使用请求库按一定时间间隔抓取网站。我使用selenium登录并获取必需的cookie,然后使用请求直接访问API。几个小时内一切正常(30-50个请求),然后我总是得到这个异常:

  File "attempt_enroll.py", line 104, in <module>
    resp = attemptEnroll(session)
  File "attempt_enroll.py", line 86, in attemptEnroll
    r = session.post(enroll_url, json=payload)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 559, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/lib/python2.6/site-packages/requests/sessions.py", line 662, in send
    r.content
  File "/lib/python2.6/site-packages/requests/models.py", line 827, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/lib/python2.6/site-packages/requests/models.py", line 752, in generate
    raise ChunkedEncodingError(e)
ChunkedEncodingError: ("Connection broken: error(9, 'Bad file descriptor')", error(9, 'Bad file descriptor'))

我认为可能会挂起套接字或文件描述符,但是运行一个小时后,该进程仍然只有4个打开的fds。调试非常困难,因为它是如此间歇地发生。这是我第一次使用请求库。

这是代码的精简版本,我将所有请求调用都放在了同一位置:

payload = {
    'some_stuff': True,
}
enroll_url = 'https://foo.ca'
expected = '''some string'''

#use selenium to login and (critically) run js on homepage to generate cookies
#then quit selenium and use the cookies to setup a requests session
def login(username, password):
    <selenium code to login snipped>
    #retrieve all the cookies and kill webdriver since we don't need it anymore
    cookies = driver.get_cookies()
    driver.quit()

    s = requests.Session()

    for cookie in cookies:
        s.cookies.set(cookie['name'], cookie['value'])
        if(cookie['name'] == 'XSRF-TOKEN'):
            s.headers.update({
                'X-XSRF-TOKEN': cookie['value'],
                'Connection':'close',
            })
    return s

def attemptEnroll(session):
    if(session is None):
        return ""
    r = session.post(enroll_url, json=payload)
    return r.text

#number of failed attempts in a row
failed_count = 0
session = None
while True:
    worked = False
    errorMsg = "Unknown error"
    try:
        resp = attemptEnroll(session)
        worked = (resp == expected)
        errorMsg = resp
    except Exception, e:
        errorMsg = str(e) + traceback.format_exc()
    if(worked):
        failed_count = 0
        #wait 2-7 minutes between requests
        wait=randint(2*60,7*60)
        sleep(wait)        
    else:
        sleep(failed_count*60)            
        failed_count += 1
        #stop after 3 failures in a row
        if(failed_count >= 3):
            break       
        #otherwise create a new login and try again
        session = login("<snip>", "<snip>!")

0 个答案:

没有答案
相关问题