Question

我正在尝试在https://github.com/uoguelph-mlrg/theano_multi_gpu之后在两个GPU上并行化我的NN。我有所有依赖项，但cuda运行时初始化失败，并显示以下消息。

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed:
cublasCreate() returned this error 'the CUDA Runtime initialization failed'
Error when trying to find the memory information on the GPU: invalid device ordinal
Error allocating 24 bytes of device memory (invalid device ordinal). Driver report 0 bytes free and 0 bytes total
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
CudaNdarray_ZEROS: allocation failed.
Process Process-1:
Traceback (most recent call last):
  File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/u/bsankara/nt/Git-nt/nt/train_attention.py", line 171, in launch_train
    clip_c=1.)
  File "/u/bsankara/nt/Git-nt/nt/nt.py", line 1616, in train
    import theano.sandbox.cuda
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/__init__.py", line 98, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 30, in test_nvidia_driver1
    A = cuda.shared_constructor(a)
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 181, in float32_shared_constructor
    enable_cuda=False)
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 389, in use
    cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((2, 3))
RuntimeError: ('CudaNdarray_ZEROS: allocation failed.', 'You asked to force this device and it failed. No fallback to the cpu or other gpu device.')

代码段的相关部分位于：

from multiprocessing import Queue
import zmq
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray

def train(private_args, process_env, <some other args>)
    if process_env is not None:
       os.environ = process_env

    ####
    # pycuda and zmq environment

    drv.init()
    dev = drv.Device(private_args['ind_gpu'])
    ctx = dev.make_context()
    sock = zmq.Context().socket(zmq.PAIR)

    if private_args['flag_client']:
        sock.connect('tcp://localhost:5000')
    else:
        sock.bind('tcp://*:5000')

    ####
    # import theano stuffs
    import theano.sandbox.cuda
    theano.sandbox.cuda.use(private_args['gpu'])

    import theano
    import theano.tensor as tensor
    from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
    import theano.misc.pycuda_init
    import theano.misc.pycuda_utils
...

导入theano.sandbox.cuda时会触发错误。在这里，我将训练功能作为两个过程启动。

def launch_train(curr_args, process_env, curr_queue, oth_queue):
    trainerr, validerr, testerr = train(private_args=curr_args,
                                        process_env=process_env,
                                         ...)

process1_env = os.environ.copy()
process1_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu0,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU1"
process2_env = os.environ.copy()
process2_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu1,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU2"

p = Process(target=launch_train,
                args=(p_args, process1_env, queue_p, queue_q))
q = Process(target=launch_train,
                args=(q_args, process2_env, queue_q, queue_p))

p.start()
q.start()
p.join()
q.join()

如果我尝试在Python中以交互方式初始化gpu，那么import语句似乎有效。我执行了火车的前20行（），它在那里工作正常，并按照我的要求正确地将我分配给了gpu0。

Answer 1

在挖掘并运行pdb后，原始海报发现了这个问题。

基本上theano和pycuda都在竞争初始化gpu，导致问题。解决方案是首先'导入theano'，这将获得一个gpu然后附加到pycuda中的特定context。因此，train函数中的导入部分将如下所示：

def train(private_args, process_env, <some other args>)
    if process_env is not None:
       os.environ = process_env

    ####
    # import theano related
    # We need global imports and so we make them as such
    theano = __import__('theano')
    _t_tensor = __import__('theano', globals(), locals(), ['tensor'], -1)
    tensor = _t_tensor.tensor

    import theano.sandbox.cuda
    import theano.misc.pycuda_utils

    ####
    # pycuda and zmq environment
    import zmq
    import pycuda.driver as drv
    import pycuda.gpuarray as gpuarray

    drv.init()
    # Attach the existing context (already initialized by theano import statement)
    ctx = drv.Context.attach()
    sock = zmq.Context().socket(zmq.PAIR)

    if private_args['flag_client']:
        sock.connect('tcp://localhost:5000')
    else:
        sock.bind('tcp://*:5000')

[此答案是作为社区维基条目添加的，来自OP的编辑，试图将此问题从未答复的列表中删除]。

使用theano

1 个答案: