某些 Celery 任务启动但挂起并且永远不会执行

时间:2021-01-06 16:10:04

标签: python django asynchronous celery

我在使用 Django 和 Celery 时遇到了一些已注册的任务永远不会执行的问题。

我的 tasks.py 文件中有三个任务,其中两个; schedule_notification()schedule_archive() 可以正常工作。它们在预定义的 ETA 执行时没有问题。

使用 schedule_monitoring() 函数,我可以看到该作业在 Celery Flower 中启动,但从未真正执行。它只是坐在那里。

我已经确认我可以从工作人员本地运行命令,所以我不确定问题出在哪里。

tasks.py(失败的函数)

@task
def schedule_monitoring(job_id: str, action: str) -> str:
    salt = OSApi() # This is a wrapper around a REST API.
    job = Job.objects.get(pk=job_id)
    target = ('compound', f"G@hostname:{ job.network.gateway.host_name } and G@serial:{ job.network.gateway.serial_number }")

    policies = [
        'foo',
        'bar',
        'foobar',
        'barfoo'
    ]

    if action == 'start':
        salt.run(target, 'spectrum.add_to_collection', fun_args=['foo'])  
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Started proactive monitoring for job.", job)
    elif action == 'stop':
        salt.run(target, 'spectrum.remove_from_collection', fun_args=['bar'])
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Stopped proactive monitoring for job.", job)
    else:
        raise NotImplementedError

    return f"Applying monitoring action: {action.upper()} to Job: {job.job_code}"

Celery Flower Output

Celery 配置

# Async
CELERY_BROKER_URL = os.environ.get('BROKER_URL', 'redis://localhost:6379')
CELERY_RESULT_BACKEND = os.environ.get('RESULT_BACKEND', 'redis://localhost:6379')
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True

下面是在应该执行它的工作人员上成功执行命令:

>>> schedule_monitoring(job.pk, 'start')
'Applying monitoring action: START to Job: Test 1'
>>> schedule_monitoring(job.pk, 'stop')
'Applying monitoring action: STOP to Job: Test 1'
>>> exit()
Waiting up to 5 seconds.
Sent all pending logs.
root@9d045ff7dfc1:/app#

从调试工人;当工作开始时,我只看到以下内容,但没有什么有趣的;

[2021-01-06 17:08:00,001: DEBUG/MainProcess] TaskPool: Apply <function _trace_task_ret at 0x7f6adbc29680> (args:('Operations.tasks.schedule_monitoring', '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', {'lang': 'py', 'task': 'Operations.tasks.schedule_monitoring', 'id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'shadow': None, 'eta': '2021-01-06T17:08:00+00:00', 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'parent_id': None, 'argsrepr': "(UUID('11118a85-20f2-488d-9a12-b8d200ea7a74'), 'start')", 'kwargsrepr': '{}', 'origin': 'gen442@31a9de56d061', 'reply_to': '24a8dc4c-2e5c-32ce-aa3d-84392d7cbf41', 'correlation_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'hostname': 'celery@bc4bb7af894f', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': ['11118a85-20f2-488d-9a12-b8d200ea7a74', 'start'], 'kwargs': {}}, b'[["11118a85-20f2-488d-9a12-b8d200ea7a74", "start"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
[2021-01-06 17:08:00,303: DEBUG/MainProcess] basic.qos: prefetch_count->32
[2021-01-06 17:08:00,305: DEBUG/MainProcess] Task accepted: Operations.tasks.schedule_monitoring[407e8a87-b3bf-4e8f-8a17-776a33ae5fea] pid:44
[2021-01-06 17:08:00,311: DEBUG/ForkPoolWorker-3] Resetting dropped connection: storage.googleapis.com
[2021-01-06 17:08:00,383: DEBUG/ForkPoolWorker-3] https://storage.googleapis.com:443 "GET /download/storage/v1/b/foo/o/bar?alt=media HTTP/1.1" 200 96
[2021-01-06 17:08:01,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:06,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:11,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:16,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:21,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:26,229: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:31,231: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]

1 个答案:

答案 0 :(得分:0)

我找到的解决方案是在 Celery 中创建两个队列,一个通过 Celery Beat 管理计划任务,另一个具有更高的优先级。

创建单独的队列后,任务开始流动并正确完成;我的猜测是拥挤的公共汽车或工人。

要创建其他队列,请在 settings.py 中执行以下操作:

from kombu import Queue, Exchange

CELERYD_MAX_TASKS_PER_CHILD = 4

CELERY_DEFAULT_QUEUE = 'scheduled'
CELERY_QUEUES = (
    Queue('scheduled', Exchange('scheduled'), routing_key='sched'),
    Queue('proactive_monitoring', Exchange('proactive_monitoring'), routing_key='prmon'),
)

然后在注册您的任务函数时,传递您希望它们分配到的队列:

tasks.py:

@task(queue='proactive_monitoring')
def schedule_monitoring(job_id: str, action: str) -> str:

最后,确保在每个队列中至少启动一个工作程序。您可以通过在启动工作程序时传递队列来完成此操作:

celery -A proj worker -l INFO -Q proactive_monitoring

如果你在 localhost 上启动多个 worker,你应该通过指定 name 属性至少区分每个队列中的前两个:

celery -A proj worker -l INFO -Q proactive_monitoring -n prmon_first_worker