清理从Flask MethodView API启动的长时间运行的子进程

时间:2019-10-08 22:33:52

标签: python flask multiprocessing

我正在构建Flask MethodView驱动的API。对于特定的端点,我使用请求数据来启动可能长时间运行的命令。我没有等待命令完成,而是将其包装在multiprocessing.Process中,调用start,然后将HTTP 202以及可用来监视进程状态的URL返回给用户。

class EndPointAPI(MethodView):

    def __init__(self):
        """ On init, filter requests missing JSON body."""

        # Check for json payload
        self.except = ["GET", "PUT", "DELETE" ]                                                                                       
        if (request.method not in self.except) and not request.json: 
            abort(400)            

    def _long_running_function(self, json_data):
        """ 
        In this function, I use the input JSON data 
        to write a script to the file system, then 
        use subprocess.run to execute it.
        """
        return

    def post(self):
        """ """

        # Get input data
        json_data = request.json

        # Kick off the long running function
        p = Process(target=long_running_function, args=(json_data,))
        p.start()

        response = {
            "result" : "job accepted",
            "links" : {
                "href" : "/monitor_job/",
            }

        }

        return jsonify(response), 202

看来{strong> post方法中启动的进程在完成后变成了僵尸,但我不知道如何在不阻止执行父方法的情况下正确跟踪和清理它们。我尝试按照Python join a process without blocking parent中的建议实施监视线程。据我了解,它建议运行一个单独的线程来监视FIFO队列,然后在返回父函数之前将进程句柄放入队列中。我尝试了一个实现(如下),但是您似乎无法将流程对象传递到线程中,因为它包含受保护的AuthenticationString属性。

Traceback (most recent call last):
|   File "/opt/miniconda3/envs/m137p3/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
|     obj = _ForkingPickler.dumps(obj)
|   File "/opt/miniconda3/envs/m137p3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
|     cls(buf, protocol).dump(obj)
|   File "/opt/miniconda3/envs/m137p3/lib/python3.6/multiprocessing/process.py", line 291, in __reduce__
 |     'Pickling an AuthenticationString object is '
| TypeError: Pickling an AuthenticationString object is disallowed for security reasons

这是我实现的 Python加入进程而不会阻止父级。我不知道这是否行得通,因为上述错误从一开始就关闭了整个系统。非常感谢我对如何负责任地启动这些进程而又不阻塞调用方法的任何想法或建议。

from threading import Thread
from multiprocessing import Queue, ...

class Joiner(Thread):

    def __init__(self, q):
        super().__init__()
        self.__q = q

    def run(self):
        while True:
            child = self.__q.get()
            if child == None:
                return
            child.join()

class EndPointAPI(MethodView):

    def __init__(self):
        """ On init, filter requests missing JSON body."""
        self._jobs = Queue()            
        self._babysitter = Joiner(self._jobs)
        self._babysitter.start()

        # Check for json payload
        self.except = ["GET", "PUT", "DELETE" ]                                                                                       
        if (request.method not in self.except) and not request.json: 
            abort(400)            

    def _long_running_function(self, json_data):
        """ 
        In this function, I use the input JSON data 
        to write a script to the file system, then 
        use subprocess.run to execute it.
        """
        return

    def post(self):
        """ """

        # Get input data
        json_data = request.json

        # Kick off the long running function
        p = Process(target=long_running_function, args=(json_data,))
        p.start()
        self._jobs.put(p)

        response = {
            "result" : "job accepted",
            "links" : {
                "href" : "/monitor_job/",
            }

        }

        return jsonify(response), 202

1 个答案:

答案 0 :(得分:1)

您是如此亲密:)除了一件事情,一切看起来都很好,您正在使用multiprocessing.Queue存储正在运行的进程,以便稍后将它们与Joiner实例一起加入。在docs中,您将了解以下内容

  

注意:将对象放入队列时,将其腌制,然后   后台线程稍后将腌制的数据刷新到底层   管道。

也就是说,当放入队列时,该过程将被序列化,从而产生以下错误

  

TypeError:不允许对AuthenticationString对象进行腌制   安全原因

之所以会这样,是因为每个进程都有唯一的authentication key。该密钥是一个字节字符串,可以看作是multiprocessing.process.AuthenticationString类型的密码,不能被腌制。

解决方案很简单,只需使用queue.Queue实例来存储长时间运行的进程。这是一个工作示例:

#!/usr/bin/env python3
import os
import time
from queue import Queue
from threading import Thread
from multiprocessing import Process


class Joiner(Thread):

    def __init__(self):
        super().__init__()
        self.workers = Queue()

    def run(self):

        while True:
            worker = self.workers.get()

            if worker is None:
                break

            worker.join()


def do_work(t):
    pid = os.getpid()
    print('Process', pid, 'STARTED')
    time.sleep(t)
    print('Process', pid, 'FINISHED')


if __name__ == '__main__':
    joiner = Joiner()
    joiner.start()

    for t in range(1, 6, 2):
        p = Process(target=do_work, args=(t,))
        p.start()
        joiner.workers.put(p)

    joiner.workers.put(None)
    joiner.join()

输出:

Process 14498 STARTED
Process 14500 STARTED
Process 14499 STARTED
Process 14498 FINISHED
Process 14499 FINISHED
Process 14500 FINISHED