Django:客户端从流断开连接后清理redis连接

时间:2012-10-12 05:56:17

标签: python django redis gevent

我在我的Django应用程序中实现了一个Server Sent Event API,用于将实时更新从后端流式传输到浏览器。后端是Redis pubsub。我的Django视图如下所示:

def event_stream(request):
   """
   Stream worker events out to browser.
   """

   listener = events.Listener(
       settings.EVENTS_PUBSUB_URL,
       channels=[settings.EVENTS_PUBSUB_CHANNEL],
       buffer_key=settings.EVENTS_BUFFER_KEY,
       last_event_id=request.META.get('HTTP_LAST_EVENT_ID')
   )

   return http.HttpResponse(listener, mimetype='text/event-stream')

我作为迭代器返回的events.Listener类看起来像这样:

class Listener(object):
    def __init__(self, rcon_or_url, channels, buffer_key=None,
                 last_event_id=None):
        if isinstance(rcon_or_url, redis.StrictRedis):
            self.rcon = rcon_or_url
        elif isinstance(rcon_or_url, basestring):
            self.rcon = redis.StrictRedis(**utils.parse_redis_url(rcon_or_url))
        self.channels = channels
        self.buffer_key = buffer_key
        self.last_event_id = last_event_id
        self.pubsub = self.rcon.pubsub()
        self.pubsub.subscribe(channels)

    def __iter__(self):
        # If we've been initted with a buffer key, then get all the events off
        # that and spew them out before blocking on the pubsub.
        if self.buffer_key:
            buffered_events = self.rcon.lrange(self.buffer_key, 0, -1)

            # check whether msg with last_event_id is still in buffer.  If so,
            # trim buffered_events to have only newer messages.
            if self.last_event_id:
                # Note that we're looping through most recent messages first,
                # here
                counter = 0
                for msg in buffered_events:
                    if (json.loads(msg)['id'] == self.last_event_id):
                        break
                    counter += 1
                buffered_events = buffered_events[:counter]

            for msg in reversed(list(buffered_events)):
                # Stream out oldest messages first
                yield to_sse({'data': msg})
        try:
            for msg in self.pubsub.listen():
                if msg['type'] == 'message':
                    yield to_sse(msg)
        finally:
            logging.info('Closing pubsub')
            self.pubsub.close()
            self.rcon.connection_pool.disconnect()

我可以使用此设置成功将事件流式传输到浏览器。但是,似乎侦听器“最终”中的断开调用实际上并没有被调用。我假设他们仍然在等待来自pubsub的消息。当客户端断开连接并重新连接时,我可以看到我的Redis实例的连接数量攀升而且永远不会下降。一旦达到1000左右,Redis开始吓坏并消耗掉所有可用的CPU。

我希望能够检测到客户端何时不再侦听并关闭当时的Redis连接。

我尝试或想过的事情:

  1. 连接池。但正如redis-py自述文件所述,“在线程之间传递PubSub或Pipeline对象是不安全的。”
  2. 用于处理连接或可能只是断开连接的中间件。这不起作用,因为中间件的process_response()方法过早调用(甚至在http标头发送到客户端之前)。当我正在向他们传输内容时,我需要在客户端断开连接时调用的内容。
  3. request_finishedgot_request_exception信号。第一个,就像中间件中的process_response()一样,似乎太快了。当客户端断开中间流时,第二个不会被调用。
  4. 最后的皱纹:在制作中我正在使用Gevent,所以我可以立刻保持很多连接打开。但是,无论是使用普通的“manage.py runserver”,还是使用Gevent monkeypatched runserver,还是使用Gunicorn的gevent worker,都会出现这种连接泄漏问题。

1 个答案:

答案 0 :(得分:0)

更新:As of Django 1.5,你需要返回一个StreamingHttpResponse实例,如果你想在这个问题/答案中懒得流出来的话。

以下的原始答案

在经历了大量的事情并阅读框架代码之后,我发现了我认为对这个问题的正确答案。

  1. 根据WSGI PEP,如果您的应用程序返回带有close()方法的迭代器,则响应完成后应由WSGI服务器调用它。 Django也支持这一点。这是我需要进行Redis连接清理的自然场所。
  2. Python的wsgiref实现中有a bug,并且在Django的'runserver'中有扩展,如果客户端从服务器中流断开连接,则会导致close()被跳过。我已经提交了补丁。
  3. 即使服务器遵循close(),在对客户端的写入实际失败之前也不会调用它。如果您的迭代器被阻塞等待pubsub并且没有发送任何内容,则不会调用close()。我通过每次客户端连接时向pubsub发送no-op消息来解决这个问题。这样,当浏览器进行正常的重新连接时,现在已解散的线程将尝试写入其已关闭的连接,抛出异常,然后在服务器调用close()时进行清理。 SSE spec表示以冒号开头的任何行都是应该被忽略的注释,所以我只是发送“:\ n”作为我的无操作消息来清除过时的客户端。
  4. 这是新代码。首先是Django视图:

    def event_stream(request):
        """
        Stream worker events out to browser.
        """
        return events.SSEResponse(
            settings.EVENTS_PUBSUB_URL,
            channels=[settings.EVENTS_PUBSUB_CHANNEL],
            buffer_key=settings.EVENTS_BUFFER_KEY,
            last_event_id=request.META.get('HTTP_LAST_EVENT_ID')
        )
    

    执行工作的Listener类,以及用于格式化SSE的辅助函数和允许视图更清晰的HTTPResponse子类:

    class Listener(object):
        def __init__(self,
                     rcon_or_url=settings.EVENTS_PUBSUB_URL,
                     channels=None,
                     buffer_key=settings.EVENTS_BUFFER_KEY,
                     last_event_id=None):
            if isinstance(rcon_or_url, redis.StrictRedis):
                self.rcon = rcon_or_url
            elif isinstance(rcon_or_url, basestring):
                self.rcon = redis.StrictRedis(**utils.parse_redis_url(rcon_or_url))
            if channels is None:
                channels = [settings.EVENTS_PUBSUB_CHANNEL]
            self.channels = channels
            self.buffer_key = buffer_key
            self.last_event_id = last_event_id
            self.pubsub = self.rcon.pubsub()
            self.pubsub.subscribe(channels)
    
            # Send a superfluous message down the pubsub to flush out stale
            # connections.
            for channel in self.channels:
                # Use buffer_key=None since these pings never need to be remembered
                # and replayed.
                sender = Sender(self.rcon, channel, None)
                sender.publish('_flush', tags=['hidden'])
    
        def __iter__(self):
            # If we've been initted with a buffer key, then get all the events off
            # that and spew them out before blocking on the pubsub.
            if self.buffer_key:
                buffered_events = self.rcon.lrange(self.buffer_key, 0, -1)
    
                # check whether msg with last_event_id is still in buffer.  If so,
                # trim buffered_events to have only newer messages.
                if self.last_event_id:
                    # Note that we're looping through most recent messages first,
                    # here
                    counter = 0
                    for msg in buffered_events:
                        if (json.loads(msg)['id'] == self.last_event_id):
                            break
                        counter += 1
                    buffered_events = buffered_events[:counter]
    
                for msg in reversed(list(buffered_events)):
                    # Stream out oldest messages first
                    yield to_sse({'data': msg})
    
            for msg in self.pubsub.listen():
                if msg['type'] == 'message':
                    yield to_sse(msg)
    
        def close(self):
            self.pubsub.close()
            self.rcon.connection_pool.disconnect()
    
    
    class SSEResponse(HttpResponse):
        def __init__(self, rcon_or_url, channels, buffer_key=None,
                     last_event_id=None, *args, **kwargs):
            self.listener = Listener(rcon_or_url, channels, buffer_key,
                                     last_event_id)
            super(SSEResponse, self).__init__(self.listener,
                                              mimetype='text/event-stream',
                                              *args, **kwargs)
    
        def close(self):
            """
            This will be called by the WSGI server at the end of the request, even
            if the client disconnects midstream.  Unless you're using Django's
            runserver, in which case you should expect to see Redis connections
            build up until http://bugs.python.org/issue16220 is fixed.
            """
            self.listener.close()
    
    
    def to_sse(msg):
        """
        Given a Redis pubsub message that was published by a Sender (ie, has a JSON
        body with time, message, title, tags, and id), return a properly-formatted
        SSE string.
        """
        data = json.loads(msg['data'])
    
        # According to the SSE spec, lines beginning with a colon should be
        # ignored.  We can use that as a way to force zombie listeners to try
        # pushing something down the socket and clean up their redis connections
        # when they get an error.
        # See http://dev.w3.org/html5/eventsource/#event-stream-interpretation
        if data['message'] == '_flush':
            return ":\n"  # Administering colonic!
    
        if 'id' in data:
            out = "id: " + data['id'] + '\n'
        else:
            out = ''
        if 'name' in data:
            out += 'name: ' + data['name'] + '\n'
    
        payload = json.dumps({
            'time': data['time'],
            'message': data['message'],
            'tags': data['tags'],
            'title': data['title'],
        })
        out += 'data: ' + payload + '\n\n'
        return out