InterfaceError:连接已经关闭(使用django + celery + Scrapy)

时间:2015-07-19 18:37:39

标签: python django scrapy celery

在Celery任务中使用Scrapy解析函数(有时可能需要10分钟)时,我得到了这个。

我用: - Django == 1.6.5 - django-celery == 3.1.16 - 芹菜== 3.1.16 - psycopg2 == 2.5.5(我也使用了psycopg2 == 2.5.4)

[2015-07-19 11:27:49,488: CRITICAL/MainProcess] Task myapp.parse_items[63fc40eb-c0d6-46f4-a64e-acce8301d29a] INTERNAL ERROR: InterfaceError('connection already closed',)
Traceback (most recent call last):
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/app/trace.py", line 284, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/backends/base.py", line 248, in store_result
    request=request, **kwargs)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/backends/database.py", line 29, in _store_result
    traceback=traceback, children=self.current_task_children(request),
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 42, in _inner
    return fun(*args, **kwargs)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 181, in store_result
    'meta': {'children': children}})
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 87, in update_or_create
    return get_queryset(self).update_or_create(**kwargs)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 70, in update_or_create
    obj, created = self.get_or_create(**kwargs)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
    return self.get(**lookup), False
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
    num = len(clone)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
    self._fetch_all()
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
    for row in compiler.results_iter():
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 785, in execute_sql
    cursor = self.connection.cursor()
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 160, in cursor
    cursor = self.make_debug_cursor(self._cursor())
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
    return self.create_cursor()
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
    return self.create_cursor()
  File "/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 137, in create_cursor
    cursor = self.connection.cursor()
InterfaceError: connection already closed

2 个答案:

答案 0 :(得分:12)

  

不幸的是这是django + psycopg2 +芹菜组合的问题。   这是一个古老而未解决的问题。

     

看一下这个帖子就明白了:   https://github.com/celery/django-celery/issues/121

     

基本上,当芹菜启动一个工人时,它会分叉数据库连接   来自django.db框架。如果此连接由于某种原因而下降,那么   不会创建一个新的。芹菜与这个问题无关   一旦无法检测何时删除数据库连接   使用django.db库。 Django没有通知它何时发生,   因为它只是启动连接并且它接收到一个wsgi调用(没有   连接池)。我在巨大的制作上遇到了同样的问题   有很多机器工人的环境,有时候,这些   机器与postgres服务器失去了连接。

     

我解决了将每个celery主进程放在linux下的问题   supervisord处理程序和一个观察者并实现了一个装饰器   处理psycopg2.InterfaceError,当它发生这个函数时   调度系统调用以强制主管重新启动   SIGINT芹菜过程。

编辑:

找到了更好的解决方案。我实现了像这样的芹菜任务基类:

from django.db import connection
import celery

class FaultTolerantTask(celery.Task):
    """ Implements after return hook to close the invalid connection.
    This way, django is forced to serve a new connection for the next
    task.
    """
    abstract = True

    def after_return(self, *args, **kwargs):
        connection.close()

@celery.task(base=FaultTolerantTask)
def my_task():
    # my database dependent code here

我相信它也会解决你的问题。

答案 1 :(得分:5)

伙计们和emanuelcds

我遇到了同样的问题,现在我已经更新了我的代码并为芹菜创建了一个新的加载器:

from djcelery.loaders import DjangoLoader
from django import db

class CustomDjangoLoader(DjangoLoader):
    def on_task_init(self, task_id, task):
        """Called before every task."""
        for conn in db.connections.all():
            conn.close_if_unusable_or_obsolete()
        super(CustomDjangoLoader, self).on_task_init(task_id, task)

当然,如果您使用的是djcelery,它在设置中也需要这样的内容:

CELERY_LOADER = 'myproject.loaders.CustomDjangoLoader'
os.environ['CELERY_LOADER'] = CELERY_LOADER

我仍然要测试它,我会更新。