使用grequests

时间:2015-07-22 21:18:08

标签: python redhat grequests

当我调用以下函数来处理一长串URL(访问同一个站点(即http://foo.bar.com/url1http://foo.bar.com/url2等)时:

import time
import grequests

def processUrls(block=2500, write=100000, timeout=0.5):
    urls = ...  ## generate long array of URLs
    chunks = [urls[i:i+block] for i in xrange(0, len(urls), block)] ## chunk 'em

    def callback(response, *args, **kwargs):
        txt = response.text
        ## do something with txt
        response.close()

    for i, chunk in enumerate(chunks):
        rs = [grequests.get(url, callback=callback) for url in chunk]
        grequests.map(rs, stream=False, size=block / 10)
        time.sleep(timeout)
        ## do stuff

我收到一堆这样的错误:

File "/.../python2.7/site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/.../python2.7/site-packages/grequests.py", line 71, in send
self.url, **merged_kwargs)
File "/.../python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/.../python2.7/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/.../python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(97, 'Address family not supported by protocol'))
<Greenlet at 0x7f8ce2c0ec30: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x7f8ce31e2890>>(stream=False)> failed with ConnectionError

邮件数量远远小于网址数量。

可能导致这些错误的原因是什么?我在RedHat 6.6上运行它

更新:我从我一直在使用的完整数据集中收集了所有给我错误的网址。它们似乎都很好(格式良好等),当我将其中一个粘贴到浏览器中时,我得到了有意义的结果,没有错误信息。然后,我只用一部分数据重新进行测试。再次,得到一些错误并收集子集的错误URL列表。事实证明,子集中的任何坏URL都不在整个集的坏URL列表中。这表明错误不是特定于URL的,而是某种类型的打嗝,无论是在我身边还是在另一边。这会响铃吗?

0 个答案:

没有答案