如何跟踪组成一个组内各个任务的进度,这些组在芹菜中形成一个和弦的标题?

时间:2013-03-15 19:57:20

标签: python django celery

import celery
def temptask(n):
    header=list(tempsubtask.si(i) for i in range(n))
    callback=templink.si('printed at last?')
    r = celery.chord(celery.group(header))(callback)
    return r

@task()
def tempsubtask(i):
    print i    
    for x in range(i):
        time.sleep(2)
        current_task.update_state(
            state='PROGRESS', meta={'completed': x, 'total': i })

@task()
def templink(x):
    print 'this should be run at last %s'%x

#executing temptask
r = temptask(100)

我想访问temp子任务更新的进度状态。我怎样才能实现它?

3 个答案:

答案 0 :(得分:4)

经过几个小时的谷歌搜索,我偶然发现http://www.manasupo.com/2012/03/chord-progress-in-celery.html。虽然那里的解决方案不适合我开箱即用,但它确实激励我尝试类似的东西。

from celery.utils import uuid
from celery import chord

class ProgressChord(chord):

    def __call__(self, body=None, **kwargs):
        _chord = self.Chord
        body = (body or self.kwargs['body']).clone()
        kwargs = dict(self.kwargs, body=body, **kwargs)
        if _chord.app.conf.CELERY_ALWAYS_EAGER:
            return self.apply((), kwargs)
        callback_id = body.options.setdefault('task_id', uuid())
        r= _chord(**kwargs)
        return _chord.AsyncResult(callback_id), r

而不是执行celery.chord我使用ProgressChord如下:

def temptask(n):
    header=list(tempsubtask.si(i) for i in range(n))
    callback=templink.si('printed at last?')
    r = celery.Progresschord(celery.group(header))(callback)
    return r

r的返回值包含一个元组,包含callback的asyncresult和一个组结果。所以成功看起来像这样:

In [3]: r
Out[3]: 
(<AsyncResult: bf87507c-14cb-4ac4-8070-d32e4ff326a6>,
 <GroupResult: af69e131-5a93-492d-b985-267484651d95 [4672cbbb-8ec3-4a9e-971a-275807124fae, a236e55f-b312-485c-a816-499d39d7de41, e825a072-b23c-43f2-b920-350413fd5c9e, e3f8378d-fd02-4a34-934b-39a5a735871d, c4f7093b-9f1a-4e5e-b90d-66f83b9c97c4, d5c7dc2c-4e10-4e71-ba2b-055a33e15f02, 07b1c6f7-fe95-4c1f-b0ba-6bc82bceaa4e, 00966cb8-41c2-4e95-b5e7-d8604c000927, e039c78e-6647-4c8d-b59b-e9baf73171a0, 6cfdef0a-25a2-4905-a40e-fea9c7940044]>)

我继承并覆盖[celery.chord][1]而不是[celery.task.chords.Chord][2],因为我无法在任何地方找到它。

答案 1 :(得分:0)

我有一个类似的问题。网上的大多数示例都已过时,文档并没有多大帮助,但是文档具有指向资源的链接,而阅读确实对我有帮助。 我的目标是分组组织并行任务。必须按顺序依次执行组。 因此,我决定分别启动任何任务之前先生成任务ID ,然后仅分配它们。我正在使用Celery 4.3.0

这是一个简短的例子。

首先,我需要一个虚拟任务来使执行顺序化并能够检查特定组的状态。由于使用了回调,因此只有在组中的所有其他任务之后才能完成。

@celery.task(bind=True, name="app.tasks.dummy_task")
def dummy_task( self, results=None, *args, **kwargs ):
    return results

我在这里的评论说明了我如何分配ID。

from celery.utils import uuid
from celery import group, chord, chain


# Generating task ids, 
# which can be saved to a db, sent to the client and so on
#
# This is done before executing any tasks

task_id_1 = uuid()
task_id_2 = uuid()

chord_callback_id_1 = uuid()
chord_callback_id_2 = uuid()

workflow_id = None


# Generating goups, using signatures
# the group may contain any number of tasks
group_1 = group(
        [
            celery.signature(
                    'app.tasks.real_task', 
                    args=(), 
                    kwargs = { 'email': some_email, 'data':some_data },
                    options = ( {'task_id': task_id_1 } )
                )
        ]
    )

group_2 = group(
        [
            celery.signature(
                    'app.tasks.real_task', 
                    args=(), 
                    kwargs = { 'email': some_email, 'data':some_data },
                    options = ( {'task_id': task_id_2 } )
                )
        ]
    )

# Creating callback task which will simply rely the result
# Using the task id, which has been generated before
# 
# The dummy task start after all tasks in this group are completed
# This way we know that the group is completed

chord_callback = celery.signature( 
        'app.tasks.dummy_task',
        options=( {'task_id': chord_callback_id_1 } )
    ) 

chord_callback_2 = celery.signature( 
        'app.tasks.dummy_task',
        options=( {'task_id': chord_callback_id_2 } )
    ) 


# we can monitor each step status
# by its chord callback id

# the id of the chord callback  
step1 = chord( group_1, body=chord_callback )
# the id of the chord callback  
step2 = chord( group_2, body=chord_callback_2 )

# start the workflow execution
# the steps will execute sequentially 
workflow = chain( step1, step2 )()


# the id of the last cord callback
workflow_id = workflow.id

# return any ids you need
print( workflow_id )

这就是我可以检查应用程序中任何任务状态的方式。

# This is a simplified example
# some code is omitted
from celery.result import AsyncResult


def task_status( task_id=None ):

    # PENDING
    # RECEIVED
    # STARTED
    # SUCCESS
    # FAILURE
    # REVOKED
    # RETRY

    task = AsyncResult(task_id)

    response = {
      'state': task.state,
    }

    return jsonify(response), 200

答案 2 :(得分:0)

老问题,我浪费了几天时间来寻找更好的现代解决方案。在我当前的项目中,我必须单独跟踪小组进度并在最终回调中释放锁定。

目前的解决方案要简单得多(但更难猜测),主题行在末尾评论:

@celery_app.task(name="_scheduler", track_started=True, ignore_result=False)
def _scheduler():
    lock = cache.lock("test_lock")
    if not lock.acquire(blocking=False):
        return {"Error": "Job already in progress"}

    lock_code = lock.local.token.decode("utf-8")

    tasks = []
    for x in range(100):
        tasks.append(calculator.s())
    
    _group = group(*tasks)
    _chord = chord(_group)(_get_results.s(token=lock_code))
    
    group_results = _chord.parent # This is actual group inside chord
    group_results.save() # I am saving it to usual results backend, and can track progress inside.
    
    return _chord # can return anything, I need only chord.

我在Celery 5.1工作