我想使用tornado来获取批量网址。所以我的代码如下所示:
from tornado.concurrent import Future
from tornado.httpclient import AsyncHTTPClient
from tornado.ioloop import IOLoop
class BatchHttpClient(object):
def __init__(self, urls, timeout=20):
self.async_http_client = AsyncHTTPClient()
self.urls = urls
self.timeout = 20
def __mid(self):
results = []
for url in self.urls:
future = Future()
def f_callback(f1):
future.set_result(f1.result())
f = self.async_http_client.fetch(url)
f.add_done_callback(f_callback)
results.append(future)
return results
def get_batch(self):
results = IOLoop.current().run_sync(self.__mid)
return results
urls = ["http://www.baidu.com?v={}".format(i) for i in range(10)]
batch_http_client = BatchHttpClient(urls)
print batch_http_client.get_batch()
当我运行代码时,会发生错误:
ERROR:tornado.application:Exception in callback <function f_callback at 0x7f35458cae60> for <tornado.concurrent.Future object at 0x7f35458c9650>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 317, in _set_done
cb(self)
File "/home/q/www/base_data_manager/utils/async_util.py", line 21, in f_callback
future.set_result(f1.result())
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 271, in set_result
self._set_done()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 315, in _set_done
for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
但如果我更改代码如下:
class BatchHttpClient(object):
def __init__(self, urls, timeout=20):
self.async_http_client = AsyncHTTPClient()
self.urls = urls
self.timeout = 20
def _get_batch(self, url):
future = Future()
f = self.async_http_client.fetch(url)
def callback(f1):
print future
print f1.result()
future.set_result(f1.result())
print '---------'
f.add_done_callback(callback)
return future
def __mid(self):
results = []
for url in self.urls:
results.append(self._get_batch(url))
return results
def get_batch(self):
results = IOLoop.current().run_sync(self.__mid)
return results
urls = ["http://www.baidu.com?v={}".format(i) for i in range(10)]
batch_http_client = BatchHttpClient(urls)
for result in batch_http_client.get_batch():
print result.body
然后它有效。 我所做的只是添加一个中间函数,为什么结果不同。
答案 0 :(得分:1)
在你的第一个代码片段中,问题是当你的回调执行时,future
的值是循环设置的最后一个值。换句话说,执行时:
def f_callback(f1):
future.set_result(f1.result())
future
的值始终相同。如果您添加print future
,则可以看到此内容:对象的地址始终相同。
在你的第二个片段中,每个future和每个回调都是在循环调用的函数中创建的。因此,每个回调都会从新范围中获取future
的值,从而解决问题。
解决此问题的另一种方法是修改__mid
,如下所示:
def __mid(self):
results = []
for url in self.urls:
future = Future()
def make_callback(future):
def f_callback(f1):
future.set_result(f1.result())
return f_callback
f = self.async_http_client.fetch(url)
f.add_done_callback(make_callback(future))
results.append(future)
return results
通过在make_callback(future)
中创建回调,回调中future
的值来自每个回调的不同范围。
答案 1 :(得分:0)
路易斯的回答是正确的,但我想提出一些更简单的选择。
首先,您可以使用functools.partial
而不是make_callback
包装函数:
def __mid(self):
results = []
for url in self.urls:
future = Future()
def f_callback(output, input):
output.set_result(f1.result())
f = self.async_http_client.fetch(url)
# partial() binds the current value of future to
# the output argument.
f.add_done_callback(functools.partial(f_callback, future))
results.append(future)
return results
但是中间Future
看起来完全没必要。这相当于:
def __mid(self):
return [self.async_http_client.fetch(url) for url in self.urls]
就个人而言,我会让__mid
成为一个协程:
@gen.coroutine
def __mid(self):
return (yield [self.async_http_client.fetch_url(url) for url in self.urls])
如果您不想使用协同程序,您可能希望将回调传递给AsyncHTTPClient.fetch
,而不是在其结果上使用Future.add_done_callback
。