烧瓶 2.0

Question

我正在Flask中编写一个应用程序，除了WSGI是同步和阻塞之外，它的效果非常好。我有一个特别的任务，它调用第三方API，该任务可能需要几分钟才能完成。我想拨打电话（它实际上是一系列电话）并让它运行。控制权归还给Flask。

我的观点如下：

@app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
    ...
    data = json.loads(request.data)
    text_list = data.get('text_list')
    final_file = audio_class.render_audio(data=text_list)
    # do stuff
    return Response(
        mimetype='application/json',
        status=200
    )

现在，我想要的是拥有

行

final_file = audio_class.render_audio()

运行并提供在方法返回时执行的回调，而Flask可以继续处理请求。这是我需要Flask异步运行的唯一任务，我想就如何最好地实现它做一些建议。

我看过Twisted和Klein，但我不确定它们是否有点过分，因为线程就足够了。或者也许芹菜是个不错的选择？

Answer 1

我会使用Celery为您处理异步任务。您需要安装代理作为任务队列（建议使用RabbitMQ和Redis）。

app.py：

from flask import Flask
from celery import Celery

broker_url = 'amqp://guest@localhost'          # Broker URL for RabbitMQ task queue

app = Flask(__name__)    
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig')      # Your celery configurations in a celeryconfig.py

@celery.task(bind=True)
def some_long_task(self, x, y):
    # Do some long task
    ...

@app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
    ...
    data = json.loads(request.data)
    text_list = data.get('text_list')
    final_file = audio_class.render_audio(data=text_list)
    some_long_task.delay(x, y)                 # Call your async task and pass whatever necessary variables
    return Response(
        mimetype='application/json',
        status=200
    )

运行Flask应用程序，然后启动另一个进程来运行芹菜工作者。

$ celery worker -A app.celery --loglevel=debug

我还会参考Miguel Gringberg的write up，以获得更深入的使用Celery和Flask的指南。

Answer 2

您还可以尝试将multiprocessing.Process与daemon=True一起使用； process.start()方法不会阻止，您可以在后台执行昂贵的函数时立即将响应/状态返回给调用方。

在使用falcon框架并使用daemon流程帮助时，我遇到了类似的问题。

您需要执行以下操作：

from multiprocessing import Process

@app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
    ...
    heavy_process = Process(  # Create a daemonic process with heavy "my_func"
        target=my_func,
        daemon=True
    )
    heavy_process.start()
    return Response(
        mimetype='application/json',
        status=200
    )

# Define some heavy function
def my_func():
    time.sleep(10)
    print("Process finished")

您应该立即得到响应，十秒钟后，您应该在控制台中看到打印的消息。

注意：请记住，不允许daemonic进程产生任何子进程。

Answer 3

线程化是另一种可能的解决方案。尽管基于Celery的解决方案更适合大规模应用，但是如果您不希望在所讨论的端点上有太多流量，则线程化是一种可行的选择。

此解决方案基于Miguel Grinberg's PyCon 2016 Flask at Scale presentation，尤其是其幻灯片中的slide 41。他的code is also available on github供那些对原始资料感兴趣的人使用。

从用户的角度来看，代码的工作方式如下：

您调用执行长时间运行任务的端点。
此端点返回202接受并带有链接以检查任务状态。
在taks仍在运行时，对状态链接的调用返回202，而在任务完成时返回200（及其结果）。

要将api调用转换为后台任务，只需添加@async_api装饰器。

这是一个完整的示例：

from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid

tasks = {}

app = Flask(__name__)
api = Api(app)


@app.before_first_request
def before_first_request():
    """Start a background thread that cleans up old tasks."""
    def clean_old_tasks():
        """
        This function cleans up old tasks from our in-memory data structure.
        """
        global tasks
        while True:
            # Only keep tasks that are running or that finished less than 5
            # minutes ago.
            five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
            tasks = {task_id: task for task_id, task in tasks.items()
                     if 't' not in task or task['t'] > five_min_ago}
            time.sleep(60)

    if not current_app.config['TESTING']:
        thread = threading.Thread(target=clean_old_tasks)
        thread.start()


def async_api(f):
    @wraps(f)
    def wrapped(*args, **kwargs):
        def task(flask_app, environ):
            # Create a request context similar to that of the original request
            # so that the task can have access to flask.g, flask.request, etc.
            with flask_app.request_context(environ):
                try:
                    tasks[task_id]['rv'] = f(*args, **kwargs)
                except HTTPException as e:
                    tasks[task_id]['rv'] = current_app.handle_http_exception(e)
                except Exception as e:
                    # The function raised an exception, so we set a 500 error
                    tasks[task_id]['rv'] = InternalServerError()
                    if current_app.debug:
                        # We want to find out if something happened so reraise
                        raise
                finally:
                    # We record the time of the response, to help in garbage
                    # collecting old tasks
                    tasks[task_id]['t'] = datetime.timestamp(datetime.utcnow())

                    # close the database session (if any)

        # Assign an id to the asynchronous task
        task_id = uuid.uuid4().hex

        # Record the task, and then launch it
        tasks[task_id] = {'task': threading.Thread(
            target=task, args=(current_app._get_current_object(),
                               request.environ))}
        tasks[task_id]['task'].start()

        # Return a 202 response, with a link that the client can use to
        # obtain task status
        print(url_for('gettaskstatus', task_id=task_id))
        return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
    return wrapped


class GetTaskStatus(Resource):
    def get(self, task_id):
        """
        Return status about an asynchronous task. If this request returns a 202
        status code, it means that task hasn't finished yet. Else, the response
        from the task is returned.
        """
        task = tasks.get(task_id)
        if task is None:
            abort(404)
        if 'rv' not in task:
            return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
        return task['rv']


class CatchAll(Resource):
    @async_api
    def get(self, path=''):
        # perform some intensive processing
        print("starting processing task")
        time.sleep(10)
        print("completed processing task")
        return f'The answer is: {path}'


api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')


if __name__ == '__main__':
    app.run(debug=True)

Answer 4

烧瓶 2.0

Flask 2.0 现在支持异步路由。您可以使用 httpx 库并为此使用 asyncio 协程。您可以像下面那样更改代码

@app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
    ...
    data = json.loads(request.data)
    text_list = data.get('text_list')
    final_file =  await asyncio.gather(
        audio_class.render_audio(data=text_list),
        do_other_stuff_function()
    )
    # Just make sure that the coroutine should not  having any blocking calls inside it. 
    return Response(
        mimetype='application/json',
        status=200
    )

以上只是一个伪代码，但您可以查看 asyncio 如何与flask 2.0 配合使用，对于HTTP 调用，您可以使用httpx。并且还要确保协程只执行一些 I/O 任务。

在Flask中进行异步任务

4 个答案:

烧瓶 2.0