在Tornado中,我们可以使用coroutine装饰器作为Python生成器整齐地编写异步函数,其中每个yield语句返回到调度程序,最后的raise / return返回一个值给调用者。但有没有办法将一系列值返回给调用者,穿插异步调用?
E.g。我怎么能打开这个同步功能:
def crawl_site_sync(rooturi):
rootpage = fetch_page_sync(rooturi)
links = extract_links(rootpage)
for link in links:
yield fetch_page_sync(link.uri)
......我可以这样称呼:
for page in crawl_site_sync("http://example.com/page.html"):
show_summary(page)
...在Tornado中进入类似外观的异步功能? E.g:
@tornado.gen.coroutine
def crawl_site_async(rooturi):
# Yield a future to the scheduler:
rootpage = yield fetch_page_async(rooturi)
links = extract_links(rootpage)
for link in links:
# Yield a future to the scheduler:
sub_page = yield fetch_page_async(link.uri)
# Yield a value to the caller:
really_really_yield sub_page # ???
我怎么称呼它?
for page in yield crawl_site_sync("http://example.com/page.html"):
# This won't work, the yield won't return until the entire
# coroutine has finished, and it won't give us an iterable.
show_summary(page)
我可以想办法让它完成,但所有这些都涉及将调用站点和函数更改到这样的程度,以至于它完全失去了与同步版本非常相似的异步版本的好处,并且它不再干净整洁。我觉得我必须在这里错过一招。有没有办法同时使用Python生成器作为一系列延迟计算值和作为Tornado协程?
答案 0 :(得分:2)
我使用来自Toro的队列,这是为协同程序设计的,就像这样合作。这是一个简单的例子:
from tornado.ioloop import IOLoop
from tornado import gen
from tornado.httpclient import AsyncHTTPClient
from toro import Queue
q = Queue(maxsize=1)
@gen.coroutine
def consumer():
item = yield q.get()
while item:
print item
item = yield q.get()
@gen.coroutine
def producer():
try:
client = AsyncHTTPClient()
for url in [
'http://tornadoweb.org',
'http://python.org',
'http://readthedocs.org']:
response = yield client.fetch(url)
item = (url, len(response.body))
yield q.put(item)
# Done.
q.put(None)
except Exception:
IOLoop.current().stop()
raise
future = producer()
IOLoop.current().run_sync(consumer, timeout=20)
更详细的网络抓取工具示例在Toro的文档中,在这里:
https://toro.readthedocs.org/en/stable/examples/web_spider_example.html