Question

不定时（几个小时一次）gunicorn工作者失败并出现以下错误：

[2014-10-29 10:21:54 +0000] [4902] [INFO] Booting worker with pid: 4902
[2014-10-29 13:15:24 +0000] [4902] [ERROR] Exception in worker process:
Traceback (most recent call last):
  File "/opt/test/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 507, in spawn_worker
    worker.init_process()
  File "/opt/test/env/local/lib/python2.7/site-packages/gunicorn/workers/gthread.py", line 109, in init_process
    super(ThreadWorker, self).init_process()
  File "/opt/test/env/local/lib/python2.7/site-packages/gunicorn/workers/base.py", line 120, in init_process
    self.run()
  File "/opt/test/env/local/lib/python2.7/site-packages/gunicorn/workers/gthread.py", line 177, in run
    self.murder_keepalived()
  File "/opt/test/env/local/lib/python2.7/site-packages/gunicorn/workers/gthread.py", line 149, in murder_keepalived
    self.poller.unregister(conn.sock)
  File "/opt/test/env/local/lib/python2.7/site-packages/trollius/selectors.py", line 408, in unregister
    key = super(EpollSelector, self).unregister(fileobj)
  File "/opt/test/env/local/lib/python2.7/site-packages/trollius/selectors.py", line 243, in unregister
    raise KeyError("{0!r} is not registered".format(fileobj))
KeyError: '<socket._socketobject object at 0x7f823f454d70> is not registered'
...
...
[2014-10-29 13:15:24 +0000] [4902] [INFO] Worker exiting (pid: 4902)
[2014-10-29 13:15:24 +0000] [5809] [INFO] Booting worker with pid: 5809
 ...

配置：

bind = '0.0.0.0:80'
workers = 1
threads = 4
debug = True
reload = True
daemon = True

我正在使用：

Python 2.7.6
gunicorn==19.1.1
trollius==1.0.2
futures==2.2.0

任何想法可能是什么原因以及如何解决这个问题？

谢谢！

Answer 1

我遇到类似的问题，我从枪炮工人那里得到了时间错误。我正在使用同步工作程序，并且具有timeout和keepalive默认设置。在我的用例中，我的http请求需要很长时间才能完成，因此同步工作程序已超时。我使用curl作为发送HTTP-1.1请求的http客户端。我将超时时间增加到了一个疯狂的高值3600，即1小时，这是有效的。但是在服务器错误日志中，我看到了与您相同的错误。这是我对这个错误的假设。因为默认情况下，所有http 1.1请求都是持久性服务器尝试通过将连接重新放入队列但不超过keepalive超时来重用连接。因此，当keepalive超时发生时，它会取消注册套接字，以便它不能被重用并将其关闭。现在，由于我的超时值非常高，服务器会尝试多次取消注册已经未注册的套接字，但是keepalive仍然默认为5秒，因此错误输出。因此，我增加了``keepalive value as well to 3600```。到目前为止它还有效。

# http://gunicorn-docs.readthedocs.org/en/latest/settings.html
timeout = 3600 # one hour timeout for long running jobs
keepalive = 3600

Answer 2

我在大约一年前就报告了这个枪炮的错误，并且修复应该在gunicorn 19.6.0及更高版本中：https://github.com/benoitc/gunicorn/issues/1258

Gunicorn工作人员定期崩溃：＆＃39;套接字未注册＆＃39;

2 个答案: