Question

我的django项目中有一个视图可以解雇芹菜任务。 celery任务本身通过subprocess / fabric触发一些map / reduce作业，hadoop作业的结果存储在磁盘上 - 实际上没有任何内容存储在数据库中。在完成hadoop作业之后，芹菜任务发送一个django信号表明它已完成，如下所示：

# tasks.py
from models import MyModel
import signals

from fabric.operations import local

from celery.task import Task

class Hadoopification(Task):
    def run(self, my_model_id, other_args):
        my_model = MyModel.objects.get(pk=my_model_id)
        self.hadoopify_function(my_model, other_args)
        signals.complete_signal.send(
            sender=self,
            my_model_id=my_model_id,
            complete=True,
        )

    def hadoopify_function(self, my_model, other_args):
        local("""hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.reduce.tasks=0 -file hadoopify.py -mapper "parse_mapper.py 0 0" -input /user/me/input.csv -output /user/me/output.csv""")

让我感到困惑的是，当芹菜任务运行时，django runserver是重装，好像我在django项目的某个地方更改了一些代码（我没有，我可以保证）您！）。有时，这甚至会导致runserver命令出错，我在runserver命令重新加载之前看到如下所示的输出并再次正常（注意：此错误消息与problem described here非常相似）。

Unhandled exception in thread started by <function inner_run at 0xa18cd14>
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
    if not enabled():
TypeError: 'NoneType' object is not callable

Original exception was:
Traceback (most recent call last):
  File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/management/commands/runserver.py", line 60, in inner_run
    run(addr, int(port), handler)
  File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 721, in run
    httpd.serve_forever()
  File "/usr/lib/python2.6/SocketServer.py", line 224, in serve_forever
    r, w, e = select.select([self], [], [], poll_interval)
AttributeError: 'NoneType' object has no attribute 'select'

我已经将问题缩小到通过将local("""hadoop ...""")替换为local("ls")来调用hadoop，这不会导致重新加载django runserver时出现任何问题。 hadoop代码中没有错误 - 当它没有被芹菜调用时，它本身运行得很好。

有什么可能导致这种情况的想法吗？

Answer 1

因此，在浏览结构源代码之后，我开始了解django正在重新加载，因为我的celery任务在fabric.operations.local命令中运行失败（在hadoop输出中很难检测到 - 巨星）。当fabric.operations.local命令失败时，fabric issues a sys.exit导致芹菜死亡，django尝试重新加载。可以通过在hadoop任务中捕获SystemExit来检测此错误，如下所示：

class Hadoopification(Task):
    def run(self, my_model_id, other_args):
        my_model = MyModel.objects.get(pk=my_model_id)
        self.hadoopify_function(my_model, other_args)
        signals.complete_signal.send(
            sender=self,
            my_model_id=my_model_id,
            complete=True,
        )

    def hadoopify_function(self, my_model, other_args):
        try:
            local("""hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.reduce.tasks=0 -file hadoopify.py -mapper "parse_mapper.py 0 0" -input /user/me/input.csv -output /user/me/output.csv""")
        except SystemExit, e:
            # print some useful debugging information about exception e here!
            raise

Answer 2

在fabric github页面here，here和here上对此进行了一些讨论。引发错误的另一个选择是使用设置上下文管理器：

from fabric.api import settings

class Hadoopification(Task):
    ...
    def hadoopify_function(self, my_model, other_args):
        with settings(warn_only=True):
            result = local(...)
        if result.failed:
            # access result.return_code, result.stdout, result.stderr
            raise UsefulException(...)

这样做的好处是可以访问返回代码和结果上的所有其他属性。

Answer 3

我的猜测是，芹菜和面料中的Task名称都有一些冲突。我建议使用更像的东西：

import celery
class Hadoopification(celery.task.Task):
    ...

如果预感很好，请尽量避免任何进一步的碰撞。

但实际上，fabric的本地是非常的，并且基本上只是一个subprocess.Popen，你可以尝试调用raw来分离除python stdlib以外的任何内容。

'./manage.py runserver'在芹菜地图/减少任务正在运行时重新启动;有时会使用inner_run引发错误

3 个答案: