如何将成批的URL发送到grequest中?

时间:2018-08-22 17:42:47

标签: python-3.x asynchronous batch-processing api-design grequests

我有一个要调用的300K API URL列表,并从中获取数据:

lst = ['url.com','url2.com']

如果我将列表的一部分细分为5个网址grequest,则可以完美地处理请求。但是,当我传递完整的〜300K URL时,会出现错误:

Problem: url.Iam.passing.in: HTTPSConnectionPool(host='url', port=xxx): Max retries exceeded with url: url.Iam.passing.in (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x552b17550>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
Traceback (most recent call last):

到目前为止进行异步调用的代码:

class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u, stream=False) for u in self.urls), exception_handler=self.exception, size=5)


    def collate_responses(self, results):
        return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)

当我通过stream=False时,我不确定自己在做什么错。

无论如何,我可以批量传递清单吗?

1 个答案:

答案 0 :(得分:1)

尝试以下方法:

def async(x):
    #.....do something here.....#
    #return grequests.map((grequests.get(x, stream=False)), exception_handler=self.exception, size=5)

for url in url_list:
    result = async(url)
    time.sleep(5)   #This will add a 5 second delay