Question

我有Celery task名为 simple_theano_tasks ：

@app.task(bind=True, queue='test')
def simple_theano_tasks(self):
  import theano, numpy as np
  my_array = np.zeros((0,), dtype=theano.config.floatX)
  shared = theano.shared(my_array, name='my_variable', borrow=True)
  print 'Done. Shared value is {}'.format(shared.get_value())

当THEANO 配置以使用CPU时，一切都按预期工作（无错误）：

$ THEANO_FLAGS=device=cpu celery -A my_project worker -c1 -l info -Q test

[INFO / MainProcess]收到的任务：my_project.tasks.simple_theano_tasks [xxxx]

[警告/工人-1]完成。共享价值是[]

[INFO / MainProcess]任务my_project.tasks.simple_theano_tasks [xxxx]成功0.00407959899985s

现在，当我启用GPU时完全相同的事情时，Theano（或CUDA）会引发错误：

$ THEANO_FLAGS=device=gpu celery -A my_project worker -c1 -l info -Q test

...

使用gpu device 0：GeForce GTX 670M（已启用CNMeM）

...

[INFO / MainProcess]收到的任务：my_project.tasks.simple_theano_tasks [xxx]

[ERROR / MainProcess]任务my_project.tasks.simple_theano_tasks [xxx]引发意外：RuntimeError（“将％lli数据元素复制到设备内存时出现”Cuda错误'初始化错误'），

追踪（最近一次呼叫最后一次）：

文件“/.../local/lib/python2.7/site-packages/celery/app/trace.py”，第240行，在trace_task中       R = retval = fun（* args，** kwargs）

文件“/.../local/lib/python2.7/site-packages/celery/app/trace.py”，第438行， protected_call       return self.run（* args，** kwargs）

文件“/.../my_project/tasks.py”，第362行，在simple_theano_tasks中       shared = theano.shared（my_array，name ='my_variable'，borrow = True）

文件“/.../local/lib/python2.7/site-packages/theano/compile/sharedvalue.py”，第247行，共享       allow_downcast = allow_downcast，** kwargs）

文件“/.../local/lib/python2.7/site-packages/theano/sandbox/cuda/var.py”，第229行，在float32_shared_constructor中       deviceval = type_support_filter（value，type.broadcastable，False，None）   RuntimeError：将％lli数据元素复制到设备内存时发生错误'初始化错误'

最后，当我在Python shell中运行完全相同的代码时，我没有错误：

$ THEANO_FLAGS=device=gpu python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano, numpy as np
Using gpu device 0: GeForce GTX 670M (CNMeM is enabled)
>>> my_array = np.zeros((0,), dtype=theano.config.floatX)
>>> shared = theano.shared(my_array, name='my_variable', borrow=True)
>>> print 'Done. Shared value is {}'.format(shared.get_value())
Done. Shared value is []

有没有人知道：

为什么theano在芹菜工作者中表现不同？
如何解决这个问题？

其他一些背景：

我正在使用theano @0.7.0和Celery @ 3.1.18
“〜/ .theanorc”文件

[全球]

floatX = FLOAT32

设备= GPU

[模式] = FAST_RUN

[NVCC]

fastmath =真

[LIB]

cnmem = 0.1

[CUDA]

根=在/ usr /本地/ CUDA

Answer 1

解决方法是：

将CPU指定为目标设备（在“.theanorc”中或使用“THEANO_FLAGS = device = cpu”）
稍后，将指定的设备覆盖到指定的GPU

Celery任务现在是：

@app.task(bind=True, queue='test')
def simple_theano_tasks(self):
  # At this point, no theano import statements have been processed, and so the device is unbound
  import theano, numpy as np
  import theano.sandbox.cuda
  theano.sandbox.cuda.use('gpu') # enable gpu
  my_array = np.zeros((0,), dtype=theano.config.floatX)
  shared = theano.shared(my_array, name='my_variable', borrow=True)
  print 'Done. Shared value is {}'.format(shared.get_value())

注意：I found the solution reading this article about using multiple GPU

在Celery芹菜工作者

1 个答案: