大约一天芹菜运行后的redis.exceptions.ConnectionError

时间:2016-09-22 12:16:59

标签: python django redis celery redis-py

这是我的完整痕迹:

    Traceback (most recent call last):
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result
    request=request, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set
    return self.ensure(self._set, (key, value), **retry_policy)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure
    **retry_policy
  File "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time
    return fun(*args, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set
    pipe.execute()
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute
    return execute(conn, stack, raise_on_error)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction
    connection.send_packed_command(all_cmds)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command
    self.connect()
  File "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 0 connecting to localhost:6379. Error.
[2016-09-21 10:47:18,814: WARNING/Worker-747] Data collector is not contactable. This can be because of a network issue or because of the data collector being restarted. In the event that contact cannot be made after a period of time then please report this problem to New Relic support for further investigation. The error raised was ConnectionError(ProtocolError('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable')),).

我真的搜索了ConnectionError但是我没有匹配的问题。

我的平台是ubuntu 14.04。这是我的redis配置的一部分。 (如果您需要整个redis.conf文件,我可以共享。顺便说一下,所有参数都在LIMITS部分关闭。)

# By default Redis listens for connections from all the network interfaces
# available on the server. It is possible to listen to just one or multiple
# interfaces using the "bind" configuration directive, followed by one or
# more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
bind 127.0.0.1

# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /var/run/redis/redis.sock
# unixsocketperm 755

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0

# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
#    equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 60 seconds.
tcp-keepalive 60

这是我的迷你redis包装器:

import redis

from django.conf import settings


REDIS_POOL = redis.ConnectionPool(host=settings.REDIS_HOST, port=settings.REDIS_PORT)


def get_redis_server():
    return redis.Redis(connection_pool=REDIS_POOL)

这就是我使用它的方式:

from redis_wrapper import get_redis_server

# view and task are working in different, indipendent processes

def sample_view(request):
    rs = get_redis_server()
    # some get-set stuff with redis



@shared_task
def sample_celery_task():
    rs = get_redis_server()
    # some get-set stuff with redis

包装版本:

celery==3.1.18
django-celery==3.1.16
kombu==3.0.26
redis==2.10.3

所以问题在于;在启动芹菜工人一段时间后发生这种连接错误。在第一次出现该错误后,所有任务都以此错误结束,直到我重新启动所有芹菜工人。 (有趣的是,芹菜花在这个有问题的时期也失败了)

我怀疑我的redis连接池使用方法,或redis配置或不太可能是网络问题。关于原因的任何想法?我做错了什么?

(PS:我将在今天看到此错误时添加redis-cli信息结果)

更新

我通过向{worker 3}参数添加--maxtasksperchild参数暂时解决了这个问题。我把它设置为200.当然,这不是解决这个问题的正确方法,它只是一种对症治疗方法。它基本上定期刷新worker实例(关闭旧进程并在旧进程达到200任务时创建新进程)并刷新我的全局redis池和连接。 所以我认为我应该专注于全球redis连接池的使用方式,我还在等待新的想法和评论。

抱歉我的英语不好,并提前致谢。

1 个答案:

答案 0 :(得分:0)

您是否在redis中启用了rdb后台保存方法? 如果是,请检查dump.rdb/var/lib/redis文件的大小。
有时文件的大小会增加并填充root目录,而redis实例不能再保存到该文件。

您可以通过发出
来停止后台保存过程 config set stop-writes-on-bgsave-error no
命令redis-cli