Question

我在链中有三个任务fetch_page，check_source和store page

def update_page_info(**headers):
    chain=fetch_page.s(headers['key']) | check_source.s(headers['key_1']) | store_info.s()
    chain().apply_async()

fetch_page获取页面并收集需要收集的内容：

@app.task(bind=True)
def fetch_page(self,url):
    #fetch_page here and return a tuple so that it can be unpacked
    # dosomething

现在获取页面后，它会检查下一个任务check_source中的源。

@app.task(bind=True)
def check_source(self,page_and_url,handle):
    try:
        #unpack your stuffs here
        page,url=page_and_url
        get_result={}

        if handle=='first_option':
            get_result=select_first_option(one,two)
            return get_result

        elif handle=='second_option':
            get_result=select_second_option(one,two)
            return (get_result)

        elif handle=='third_option':
            get_result=select_third_option(one,two)
            return (get_result)
        else:
            return "IGNORE FOR NOW"
    except Exception as exc:
        pass

所以混淆是我可以从这里调用其他一些任务吗？是否会出现任何不一致或工人是否会陷入僵局？

最后它应该执行store_info（），它只存储从check_source（）

返回的内容

@app.task(bind=True)
def store_info(self,result):
    print ("store_info ")
    try:
        #store the fetched pages

    except Exception as exc:
        #dosomething
    finally:
        pass

我正在遵循这种方法，只需要很少的修改http://docs.celeryproject.org/en/latest/userguide/tasks.html#avoid-launching-synchronous-subtasks。

有人可以告诉我应该怎么做以及我需要更加小心的事情吗？

Answer 1

这一切都应该像你正在阅读（和沟通）那样工作。这三个任务将按顺序执行，没有任何“不一致”。＆＃34;

如果您一次调用update_page_info，则三个链式子任务将彼此独占运行。也就是说，这种设置仍然存在死锁的可能性。如果您在上次调用之前的任务中调用了update_page_info，那么您可以同时运行多个任务。这将根据您的任务共享资源的方式引入死锁的可能性。

如果您的任务共享资源，我建议使用redis或memcached之类的东西作为跨工人的锁定系统。

编辑：我现在看到的代码完全没问题，因为结果作为参数传递给下一个任务。

Celery调用不同的功能并继续链接过程

1 个答案: