我正在尝试在https://github.com/uoguelph-mlrg/theano_multi_gpu之后在两个GPU上并行化我的NN。我有所有依赖项,但cuda运行时初始化失败,并显示以下消息。
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed:
cublasCreate() returned this error 'the CUDA Runtime initialization failed'
Error when trying to find the memory information on the GPU: invalid device ordinal
Error allocating 24 bytes of device memory (invalid device ordinal). Driver report 0 bytes free and 0 bytes total
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
CudaNdarray_ZEROS: allocation failed.
Process Process-1:
Traceback (most recent call last):
File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/u/bsankara/nt/Git-nt/nt/train_attention.py", line 171, in launch_train
clip_c=1.)
File "/u/bsankara/nt/Git-nt/nt/nt.py", line 1616, in train
import theano.sandbox.cuda
File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/__init__.py", line 98, in <module>
theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 30, in test_nvidia_driver1
A = cuda.shared_constructor(a)
File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 181, in float32_shared_constructor
enable_cuda=False)
File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 389, in use
cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((2, 3))
RuntimeError: ('CudaNdarray_ZEROS: allocation failed.', 'You asked to force this device and it failed. No fallback to the cpu or other gpu device.')
代码段的相关部分位于:
from multiprocessing import Queue
import zmq
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
def train(private_args, process_env, <some other args>)
if process_env is not None:
os.environ = process_env
####
# pycuda and zmq environment
drv.init()
dev = drv.Device(private_args['ind_gpu'])
ctx = dev.make_context()
sock = zmq.Context().socket(zmq.PAIR)
if private_args['flag_client']:
sock.connect('tcp://localhost:5000')
else:
sock.bind('tcp://*:5000')
####
# import theano stuffs
import theano.sandbox.cuda
theano.sandbox.cuda.use(private_args['gpu'])
import theano
import theano.tensor as tensor
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
import theano.misc.pycuda_init
import theano.misc.pycuda_utils
...
导入theano.sandbox.cuda时会触发错误。在这里,我将训练功能作为两个过程启动。
def launch_train(curr_args, process_env, curr_queue, oth_queue):
trainerr, validerr, testerr = train(private_args=curr_args,
process_env=process_env,
...)
process1_env = os.environ.copy()
process1_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu0,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU1"
process2_env = os.environ.copy()
process2_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu1,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU2"
p = Process(target=launch_train,
args=(p_args, process1_env, queue_p, queue_q))
q = Process(target=launch_train,
args=(q_args, process2_env, queue_q, queue_p))
p.start()
q.start()
p.join()
q.join()
如果我尝试在Python中以交互方式初始化gpu,那么import语句似乎有效。我执行了火车的前20行(),它在那里工作正常,并按照我的要求正确地将我分配给了gpu0。
答案 0 :(得分:0)
在挖掘并运行pdb后,原始海报发现了这个问题。
基本上theano和pycuda都在竞争初始化gpu,导致问题。解决方案是首先'导入theano',这将获得一个gpu然后附加到pycuda中的特定context
。因此,train
函数中的导入部分将如下所示:
def train(private_args, process_env, <some other args>)
if process_env is not None:
os.environ = process_env
####
# import theano related
# We need global imports and so we make them as such
theano = __import__('theano')
_t_tensor = __import__('theano', globals(), locals(), ['tensor'], -1)
tensor = _t_tensor.tensor
import theano.sandbox.cuda
import theano.misc.pycuda_utils
####
# pycuda and zmq environment
import zmq
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
drv.init()
# Attach the existing context (already initialized by theano import statement)
ctx = drv.Context.attach()
sock = zmq.Context().socket(zmq.PAIR)
if private_args['flag_client']:
sock.connect('tcp://localhost:5000')
else:
sock.bind('tcp://*:5000')
[此答案是作为社区维基条目添加的,来自OP的编辑,试图将此问题从未答复的列表中删除]。