概念

Question

我的项目中有一个要求，即客户可以暂停或恢复正在处理的流程，而不是流程。我正在使用 web socket 来显示芹菜任务结果，但在暂停/恢复时我不明白如何设计代码。我想到的唯一方法是撤销暂停请求中的任务，同时将已撤销进程的数据保留在缓存中，并在稍后的resume api中使用该缓存来启动芹菜任务再次。通过使用这种方法，我的Web套接字设计流程受到干扰，因为我通过websocket轮询任务处理状态，当没有进程时，我发送了完成真实标志来关闭连接。要知道正在处理或挂起的任务，我为任务映射添加了一个单独的表，并在执行上一个任务时刷新该表。请帮助我设计这个完美无瑕的设计，如果我错过了什么，请指出我。

Answer 1

错误的做法。您永远不应该手动暂停或撤消进程以获取当前状态。暂停和撤销状态相当于代理商错误。

尝试重新设计代码。

实现的主要目标是这句话

customer can pause or resume process which are pending not the process one

将您的代码设计为https://en.wikipedia.org/wiki/Workflow_pattern

将代码拆分为步骤或状态。一个芹菜流程可以完成所有工作流程，但是例如，如果您对许多外部提供程序执行了很多请求（一个请求=一个状态），则不需要这样做。如果客户暂停状态，则停止芹菜过程。添加将检查状态何时更改为活动状态的事件，并再次运行新芹菜进程到此任务。

Answer 2

我想演示一种通过工作流模式实现可暂停（和可恢复）正在进行芹菜任务的通用方法。

概念

使用celery workflows-您可以将整个操作设计为分为chain个任务。它不一定必须是纯粹的链条，但应该遵循另一个任务（或任务group）完成之后一个任务的一般概念。

一旦有了这样的工作流程，就可以最终定义点，以便在整个工作流程中暂停。在这些点的每个处，您可以检查前端用户是否已请求操作暂停并相应地继续操作。概念是这样的：-

一个复杂且耗时的操作O分为5个芹菜任务-T1，T2，T3，T4和T5-这些任务（第一个任务除外）中的每一个都取决于前一个任务的返回值。

假设我们定义要在每个任务之后暂停的点，因此工作流程看起来像-

T1执行
T1完成，检查用户是否已请求暂停
- 如果用户未请求暂停-继续
- 如果用户请求暂停，请序列化 剩余的工作流程链，并将其存储在某个地方，以便以后继续

...等等。由于每个任务之后都有一个暂停点，因此将在每个任务之后执行检查（当然，最后一个除外）。

但这只是理论，我很难在网上的任何地方找到这种实现，所以这就是我想出的-

实施

from typing import Any, Optional

from celery import shared_task
from celery.canvas import Signature, chain, signature

@shared_task(bind=True)
def pause_or_continue(
    self, retval: Optional[Any] = None, clause: dict = None, callback: dict = None
):
    # Task to use for deciding whether to pause the operation chain
    if signature(clause)(retval):
        # Pause requested, call given callback with retval and remaining chain
        # chain should be reversed as the order of execution follows from end to start
        signature(callback)(retval, self.request.chain[::-1])
        self.request.chain = None
    else:
        # Continue to the next task in chain
        return retval


def tappable(ch: chain, clause: Signature, callback: Signature, nth: Optional[int] = 1):
    '''
    Make a operation workflow chain pause-able/resume-able by inserting
    the pause_or_continue task for every nth task in given chain

    ch: chain
        The workflow chain

    clause: Signature
        Signature of a task that takes one argument - return value of
        last executed task in workflow (if any - othewise `None` is passsed)
        - and returns a boolean, indicating whether or not the operation should continue

        Should return True if operation should continue normally, or be paused

    callback: Signature
        Signature of a task that takes 2 arguments - return value of
        last executed task in workflow (if any - othewise `None` is passsed) and
        remaining chain of the operation workflow as a json dict object
        No return value is expected

        This task will be called when `clause` returns `True` (i.e task is pausing)
        The return value and the remaining chain can be handled accordingly by this task

    nth: Int
        Check `clause` after every nth task in the chain
        Default value is 1, i.e check `clause` after every task
        Hence, by default, user given `clause` is called and checked
        after every task

    NOTE: The passed in chain is mutated in place
    Returns the mutated chain
    '''
    newch = []
    for n, sig in enumerate(ch.tasks):
        if n != 0 and n % nth == nth - 1:
            newch.append(pause_or_continue.s(clause=clause, callback=callback))
        newch.append(sig)
    ch.tasks = tuple(newch)
    return ch

说明-`pause_or_continue`

pause_or_continue是上述暂停点。该任务将按特定的时间间隔（时间间隔以任务间隔而不是时间间隔）被调用。然后，此任务调用用户提供的功能（实际上是任务）-clause-检查任务是否应该继续。

如果clause函数（实际上是一个任务）返回True，则将调用用户提供的callback函数，并返回最新的返回值（如果有的话-None）传递到此回调以及剩余任务链。 callback执行所需的操作，pause_or_continue将self.request.chain设置为None，这告诉芹菜“任务链现在是空的-一切都完成了。”

如果clause函数（实际上是一个任务）返回了False，则返回上一个任务的返回值（如果有，否则返回None），以便下一个任务接收-连锁继续进行。因此，工作流程继续进行。

为什么`clause`和`callback`任务签名而不是常规函数？

clause和callback都被直接称为 -没有delay或apply_async。它在当前上下文中的当前进程中执行。因此它的行为与普通函数完全相同，那么为什么要使用signatures？

答案是序列化。您不能方便地将常规函数对象传递给celery任务。但是您可以传递任务签名。那正是我在这里所做的。 clause和callback都应该是芹菜任务的常规 signature对象。

什么是`self.request.chain`？

self.request.chain存储一个字典列表（将json表示为celery任务序列化程序，默认情况下为json）-每个字典都表示一个任务签名。此列表中的每个任务都以相反的顺序执行。这就是为什么在传递给用户提供的callback函数（实际上是一项任务）之前，该列表是反向的-用户可能希望任务的顺序从左到右。

快速笔记：与本讨论无关，但是如果您使用link中的apply_async参数来构造链而不是chain原语本身。 self.request.callback是要修改的属性（即设置为None以删除回调和停止链）而不是self.request.chain

说明-`tappable`

tappable只是一个基本函数，它带有一条链（为简洁起见，这里是唯一涵盖的工作流原语），并在每个pause_or_continue任务之后插入nth。您可以将它们插入真正想要的任何地方，这取决于您在操作中定义暂停点。这只是一个例子！

对于每个chain对象，任务的实际签名（按从左到右的顺序）存储在.tasks属性中。这是任务签名的 tuple 。因此，我们要做的就是将这个元组转换为列表，插入暂停点，然后转换回元组以分配给链。然后返回修改后的链对象。

clause和callback也附加到pause_or_continue签名。普通的芹菜。

虽然涵盖了主要概念，但是要展示使用此模式的真实项目（并展示已暂停任务的恢复部分），下面是所有必要资源的小样

用法

此示例用法假定具有数据库的基本Web服务器的概念。每当启动操作（即工作流链）时，都会为其分配一个ID 并存储到数据库中。该表的架构看起来像-

-- Create operations table
-- Keeps track of operations and the users that started them
CREATE TABLE operations (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  requester_id INTEGER NOT NULL,
  completion TEXT NOT NULL,
  workflow_store TEXT,
  result TEXT,
  FOREIGN KEY (requester_id) REFERENCES user (id)
);

目前唯一需要知道的字段是completion。它只是存储操作状态-

操作开始并创建数据库条目时，将其设置为IN PROGRESS
当用户请求暂停时，路径控制器（即视图）将其修改为REQUESTING PAUSE
当操作实际暂停并调用callback（从tappable内部pause_or_continue开始）时，callback应该将其修改为PAUSED
任务完成后，应将其修改为COMPLETED

`clause`

的示例

@celery.task()
def should_pause(_, operation_id: int):
    # This is the `clause` to be used for `tappable`
    # i.e it lets celery know whether to pause or continue
    db = get_db()

    # Check the database to see if user has requested pause on the operation
    operation = db.execute(
        "SELECT * FROM operations WHERE id = ?", (operation_id,)
    ).fetchone()
    return operation["completion"] == "REQUESTING PAUSE"

这是在暂停点调用的任务，以确定是否暂停。这个函数需要2个参数.....很好。第一个是强制性的，tappable 需要 clause具有一个（也是一个）参数-因此它可以将前一个任务的返回值传递给它（即使返回值为None）。在此示例中，不需要使用返回值-因此我们可以忽略它。

第二个参数是操作ID。 clause所要做的-检查数据库中是否存在操作（工作流）条目，并查看其状态是否为REQUESTING PAUSE。为此，它需要知道操作ID。但是clause应该是一个只有一个参数的任务，这有什么用？

好东西签名可以是局部的。首次启动任务并创建tappable链时。操作id 是已知的，因此我们可以should_pause.s(operation_id)来获取带有一个参数的任务的签名，该参数是前一个任务的返回值。那就是clause了！

`callback`

的示例

import os
import json
from typing import Any, List

@celery.task()
def save_state(retval: Any, chains: dict, operation_id: int):
    # This is the `callback` to be used for `tappable`
    # i.e this is called when an operation is pausing
    db = get_db()

    # Prepare directories to store the workflow
    operation_dir = os.path.join(app.config["OPERATIONS"], f"{operation_id}")
    workflow_file = os.path.join(operation_dir, "workflow.json")
    if not os.path.isdir(operation_dir):
        os.makedirs(operation_dir, exist_ok=True)
    
    # Store the remaining workflow chain, serialized into json
    with open(workflow_file, "w") as f:
        json.dump(chains, f)

    # Store the result from the last task and the workflow json path
    db.execute(
        """
        UPDATE operations
        SET completion = ?,
            workflow_store = ?,
            result = ?
        WHERE id = ?
        """,
        ("PAUSED", workflow_file, f"{retval}", operation_id),
    )
    db.commit()

这是任务被暂停时要调用的任务。请记住，这应该采用上次执行的任务的返回值和其余签名列表（按从左到右的顺序）。再有一个额外的参数-operation_id-。对此的解释与clause的解释相同。

此函数将剩余的链存储在json文件中（因为它是字典列表）。请记住，您可以使用其他序列化器-我使用的是json，因为它是celery使用的默认任务序列化器。

存储剩余的链后，它将completion的状态更新为PAUSED，并将json文件的路径记录到数据库中。

现在，让我们看看它们的作用-

启动工作流程的示例

def start_operation(user_id, *operation_args, **operation_kwargs):
    db = get_db()
    operation_id: int = db.execute(
        "INSERT INTO operations (requester_id, completion) VALUES (?, ?)",
        (user_id, "IN PROGRESS"),
    ).lastrowid
    # Convert a regular workflow chain to a tappable one
    tappable_workflow = tappable(
        (T1.s() | T2.s() | T3.s() | T4.s() | T5.s(operation_id)),
        should_pause.s(operation_id),
        save_state.s(operation_id),
    )
    # Start the chain (i.e send task to celery to run asynchronously)
    tappable_workflow(*operation_args, **operation_kwargs)
    db.commit()
    return operation_id

接受用户ID并启动操作工作流程的函数。这或多或少是围绕视图/路线控制器建模的不切实际的虚拟函数。但是我认为这可以使总体思路得以贯彻。

假设T[1-4]是该操作的所有单位任务，每个任务都将前一个任务的返回值作为参数。只是普通芹菜链的一个示例，请随意使用链条。

T5是一项将最终结果（来自T4的结果）保存到数据库的任务。因此，除了T4的返回值之外，还需要operation_id。传递给签名。

暂停工作流程的示例

def pause(operation_id):
    db = get_db()

    operation = db.execute(
        "SELECT * FROM operations WHERE id = ?", (operation_id,)
    ).fetchone()

    if operation and operation["completion"] == "IN PROGRESS":
        # Pause only if the operation is in progress
        db.execute(
            """
            UPDATE operations
            SET completion = ?
            WHERE id = ?
            """,
            ("REQUESTING PAUSE", operation_id),
        )
        db.commit()
        return 'success'

    return 'invalid id'

这采用了前面提到的修改数据库条目以将completion更改为REQUESTING PAUSE的概念。提交后，下次pause_or_continue调用should_pause时，它将知道用户已请求暂停操作，因此将相应地暂停操作。

恢复工作流程的示例

def resume(operation_id):
    db = get_db()

    operation = db.execute(
        "SELECT * FROM operations WHERE id = ?", (operation_id,)
    ).fetchone()

    if operation and operation["completion"] == "PAUSED":
        # Resume only if the operation is paused
        with open(operation["workflow_store"]) as f:
            # Load the remaining workflow from the json
            workflow_json = json.load(f)
        # Load the chain from the json (i.e deserialize)
        workflow_chain = chain(signature(x) for x in serialized_ch)
        # Start the chain and feed in the last executed task result
        workflow_chain(operation["result"])

        db.execute(
            """
            UPDATE operations
            SET completion = ?
            WHERE id = ?
            """,
            ("IN PROGRESS", operation_id),
        )
        db.commit()
        return 'success'

    return 'invalid id'

回想一下，当操作暂停时-剩余的工作流存储在json中。由于我们当前将工作流程限制为chain对象。我们知道这个json是应该变成chain的签名列表。因此，我们对其进行反序列化，然后将其发送给芹菜工作者。

请注意，此剩余的工作流程仍然具有pause_or_continue个任务，因此它们仍然可以暂停/恢复。暂停时，workflow.json只会被更新。

如何暂停或恢复芹菜任务？

2 个答案:

概念

实施

说明-`pause_or_continue`

为什么`clause`和`callback`任务签名而不是常规函数？

什么是`self.request.chain`？

说明-`tappable`

用法

`clause`

`callback`

启动工作流程的示例

暂停工作流程的示例

恢复工作流程的示例

如何暂停或恢复芹菜任务？

2 个答案:

概念

实施

说明-pause_or_continue

为什么clause和callback任务签名而不是常规函数？

什么是self.request.chain？

说明-tappable

用法

clause

callback

启动工作流程的示例

暂停工作流程的示例

恢复工作流程的示例

说明-`pause_or_continue`

为什么`clause`和`callback`任务签名而不是常规函数？

什么是`self.request.chain`？

说明-`tappable`

`clause`

`callback`