I am a non-root user on a cluster computer running Scientific Linux release 6.6 (Carbon).
I am experiencing some theano crashes when running code on a GPU with CUDA 7.5 and cuDNN 5. I am using Python 2.7, Theano 0.9, Keras 1.0.7 and Lasange 0.1.
The following crash occurs ONLY when I run the program on a GPU node with cuDNN enabled. The code completes without issue on a CPU and a GPU with cuDNN disabled.
Traceback (most recent call last):
File "runner.py", line 306, in <module>
main()
File "runner.py", line 241, in main
queries_exp = __import__(args.exp_model).queries_exp
File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 923, in <module>
queries_exp = EntityVectorLinkExp()
File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 51, in __init__
self._setup()
File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 543, in _setup
on_unused_input='ignore',
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 484, in pfunc
output_keys=output_keys)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1788, in orig_function
output_keys=output_keys).create(
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1467, in __init__
optimizer_profile = optimizer(fgraph)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 102, in __call__
return self.optimize(fgraph)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 235, in apply
sub_prof = optimizer.optimize(fgraph)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 235, in apply
sub_prof = optimizer.optimize(fgraph)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
ret = self.apply(fgraph, *args, **kwargs)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 2262, in apply
lopt_change = self.process_node(fgraph, node, lopt)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1825, in process_node
lopt, node)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1719, in warn_inplace
return NavigatorOptimizer.warn(exc, nav, repl_pairs, local_opt, node)
File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1705, in warn
raise exc
AssertionError
My .theanorc looks like this:
[global]
floatX = float32
device = gpu
[lib]
cnmem = 1
[nvcc]
fastmath = True
And my profile has the following:
export LD_LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LD_LIBRARY_PATH
export CPATH=/home/t/tj/tjb32/cuda/include:$CPATH
export LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/home/t/tj/tjb32/cuda/bin:$PATH
When I query theano, the following is returned, which suggests to me that theano is interacting with CUDA and cuDNN.
Using gpu device 0: Tesla K20m (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5005)
I'm fairly sure that I have installed CUDA and cuDNN correctly, if anyone could suggest any additional configuration steps that I may have missed that is causing cuDNN to crash the program, that would be greatly appreciated.
答案 0 :(得分:0)
不确定这是否可能是问题,但是: export LIBRARY_PATH = / home / t / tj / tjb32 / cuda / lib64:$ LD _ LIBRARY_PATH 应该? export LIBRARY_PATH = / home / t / tj / tjb32 / cuda / lib64:$ LIBRARY_PATH
答案 1 :(得分:0)
我还使用CUDA-7.5和CuDNN 5在Keras中运行DNN。我在家中创建了一个单独的目录(cuDNN/copy)
,并将所有CuDNN(从nvidia网站获得)文件(.so和.h文件)放在此目录中。然后我对bashrc中的PATH和LD_LIBRARY变量进行了适当的更改。我还在.theanorc文件中进行了更改。所以DNN对我有用。
这就是我的bashrc看起来的样子 -
##########################
# CUSTOMIZATIONS GO HERE #
##########################
export PATH="/users/start2015/r0605639/miniconda2/envs/kerPy3.4/bin:$PATH"
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$PATH"
#http://www.chioka.in/why-is-keras-running-so-slow/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.5/lib64:/users/start2015/r0605639/cuDNN/copy:
export LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.5/lib64:/users/start2015/r0605639/cuDNN/copy:
export CPATH=$CPATH:/users/start2015/r0605639/cuDNN/copy:
export PATH=$PATH:/usr/local/cuda-7.5/bin
这就是我的.theanorc的样子:
[global]
device = gpu
floatX = float32
optimizer = fast_run
[blas]
ldflags = -L/users/start2015/r0605639/kerasLibs/lib -lopenblas
[lib]
cnmem = 0.8
[cuda]
root = /usr/local/cuda-7.5
[nvcc]
fastmath = True
optimizer_including=cudnn
flags=-D_FORCE_INLINES -I/usr/local/cuda-7.5/include -I/usr/local/cuda-7.5/bin
[dnn]
enabled = True