写一个也会产生正常值的龙卷风协程

时间:2014-03-14 14:44:58

标签: python asynchronous tornado

在Tornado中,我们可以使用coroutine装饰器作为Python生成器整齐地编写异步函数,其中每个yield语句返回到调度程序,最后的raise / return返回一个值给调用者。但有没有办法将一系列值返回给调用者,穿插异步调用?

E.g。我怎么能打开这个同步功能:

def crawl_site_sync(rooturi):
    rootpage = fetch_page_sync(rooturi)
    links = extract_links(rootpage)
    for link in links:
        yield fetch_page_sync(link.uri)

......我可以这样称呼:

for page in crawl_site_sync("http://example.com/page.html"):
    show_summary(page)

...在Tornado中进入类似外观的异步功能? E.g:

@tornado.gen.coroutine
def crawl_site_async(rooturi):
    # Yield a future to the scheduler:
    rootpage = yield fetch_page_async(rooturi)
    links = extract_links(rootpage)
    for link in links:
        # Yield a future to the scheduler:
        sub_page = yield fetch_page_async(link.uri)
        # Yield a value to the caller:
        really_really_yield sub_page # ???

我怎么称呼它?

for page in yield crawl_site_sync("http://example.com/page.html"):
    # This won't work, the yield won't return until the entire
    # coroutine has finished, and it won't give us an iterable.
    show_summary(page)

我可以想办法让它完成,但所有这些都涉及将调用站点和函数更改到这样的程度,以至于它完全失去了与同步版本非常相似的异步版本的好处,并且它不再干净整洁。我觉得我必须在这里错过一招。有没有办法同时使用Python生成器作为一系列延迟计算值作为Tornado协程?

1 个答案:

答案 0 :(得分:2)

我使用来自Toro的队列,这是为协同程序设计的,就像这样合作。这是一个简单的例子:

from tornado.ioloop import IOLoop
from tornado import gen
from tornado.httpclient import AsyncHTTPClient
from toro import Queue

q = Queue(maxsize=1)


@gen.coroutine
def consumer():
    item = yield q.get()
    while item:
        print item
        item = yield q.get()


@gen.coroutine
def producer():
    try:
        client = AsyncHTTPClient()
        for url in [
                'http://tornadoweb.org',
                'http://python.org',
                'http://readthedocs.org']:
            response = yield client.fetch(url)
            item = (url, len(response.body))
            yield q.put(item)

        # Done.
        q.put(None)
    except Exception:
        IOLoop.current().stop()
        raise

future = producer()
IOLoop.current().run_sync(consumer, timeout=20)

更详细的网络抓取工具示例在Toro的文档中,在这里:

https://toro.readthedocs.org/en/stable/examples/web_spider_example.html