Question

我有一个FLASK APP并使用gunicorn（同步模式）作为Web服务器。对于异步推送信息，我使用“gunicorn服务器挂钩”启动一个维护进程（multiprocessing.Process（））当gunicorn启动时，并使用multiprocessing.Queue（）（它实际上是logging.handlers.QueueHandler（Queue）兼容用python日志记录）来发送消息。但是我发现如果gunicorn工作者在“[CRITICAL] WORKER TIMEOUT”时重新启动，则维护进程不会从gunicorn worker发送的队列消息（queue.qsize（）不是0，并根据日志，它将消息放入成功排队，但Queue.get（超时）引发空异常），但可以从gunicorn主进程获取消息。我的日志：

  34 pid:24831 wechatlog : 2017-10-10 06:32:43,552 wechat_middle.py[line:34] DEBUG recive <LogRecord: wechat, 40, /www_upload/src/api_server.py, 543, "{'tag_list': 1, 'msg': 'company_test sid:8607550100000080 id: 8607550100000080 his: 1', 'lastsend': 'serial_error'}">
  35 pid:23930 wechat    : 2017-10-10 06:38:56,805 api_server.py[line:543] ERROR {'tag_list': 1, 'msg': 'company_test sid:8607550100000080 id: 8607550100000080 his: 1', 'lastsend': 'serial_error'}
  36 pid:24831 wechatlog : 2017-10-10 06:38:56,807 wechat_middle.py[line:34] DEBUG recive <LogRecord: wechat, 40, /www_upload/src/api_server.py, 543, "{'tag_list': 1, 'msg': 'company_test sid:8607550100000080 id: 8607550100000080 his: 1', 'lastsend': 'serial_error'}">
  37 pid:24887 wechat    : 2017-10-10 07:07:50,904 api_server.py[line:543] ERROR {'tag_list': 1, 'msg': 'company_test sid:8607550100000080 id: 8607550100000080 his: 1', 'lastsend': 'serial_error'}
  38 pid:24831 wechatlog : 2017-10-10 07:07:51,810 maintain_task.py[line:274] INFO current qsize: 1, debug_size: 0
  39 pid:24831 wechatlog : 2017-10-10 07:07:55,813 maintain_task.py[line:274] INFO current qsize: 1, debug_size: 1
  40 pid:24831 wechatlog : 2017-10-10 07:07:57,813 wechat_middle.py[line:25] INFO in debug mode, queue id 139972199063056, size 1
  41 pid:24831 wechatlog : 2017-10-10 07:07:59,816 wechat_middle.py[line:31] ERROR in debug mode, queue get nothing.
  42 pid:24831 wechatlog : 2017-10-10 07:07:59,816 maintain_task.py[line:274] INFO current qsize: 1, debug_size: 1
  43 pid:24831 wechatlog : 2017-10-10 07:08:00,817 maintain_task.py[line:281] ERROR queue is empty
  44 pid:24831 wechatlog : 2017-10-10 07:08:00,818 maintain_task.py[line:283] ERROR the message block the queue: None

2017-10-10 06:38:56到2017-10-10 07:07:50之间，gunicorn日志报告此：

 [2017-10-10 06:41:08 +0800] [23906] [CRITICAL] WORKER TIMEOUT (pid:24838)

我的代码：

maintain_task.py
def wechat_push_thread(queue):
    we = wechat_middler_ware(queue=queue)
    wechat_log_logger = configs.make_logger_handler('wechatlog', filename='wechat')
    wechat_log_logger.info(f'queue id: {id(queue)}')
    debug_size = 0
    while True:
        try:
            we.listen(2)
        except Exception as e:
            wechat_log_logger.exception(e)
        # for debug
        if queue.qsize() > 0:
            wechat_log_logger.info(f'current qsize: {queue.qsize()}, debug_size: {debug_size}')
            if debug_size == queue.qsize():
                if we.debug_flag:
                    try:
                        msg = queue.get(timeout=1)
                    except Empty:
                        msg = None
                        wechat_log_logger.error(f'queue is empty')
                    wechat_log_logger.error(f'the message block the queue: {msg}')
                we.debug_flag = True
            debug_size = queue.qsize()
        else:
            we.debug_flag = False
            debug_size = 0
        # endfor debug
        if quit_event.wait(timeout=2):
            break
    logger.info('wechat_push_thread clean env')

wechat_middle.py
class wechat_middler_ware:
    def __init__(self, queue):
        self.q = queue
        self.logger = configs.make_logger_handler('wechatlog', filename='wechat')
        self.push_api = Push_Server(logger=self.logger)
        self.debug_flag = False

    def listen(self, timeout):
        while True:
            if self.debug_flag:
                self.logger.info(f'in debug mode, queue id {id(self.q)}, size {self.q.qsize()}')
            try:
                msg = self.q.get(timeout=timeout)
                self.logger.debug(f'recive {msg}')
            except Empty:
                if self.debug_flag:
                    self.logger.error(f'in debug mode, queue get nothing.')
                break
            else:
                ...

Answer 1

根据py doc：

警告如果在关联进程使用a时使用此方法管道或队列然后管道或队列容易被破坏其他过程可能无法使用。同样，如果过程有获得锁定或信号量等，然后终止它是负责任的导致其他进程陷入僵局。

gunicorn master杀死了“超时”工作者，因此队列已被其他进程无法使用。现在，我使用multiprocessing.manager.queue而不是multiprocessing.queue。即使工人被主人杀死也能很好地工作。

multiprocessing.queue在gunicorn worker超时后无法获取数据

1 个答案: