Question

有时候，经过一段时间的正常运行后，我对Theano / CUDA会出现这样的错误：

RuntimeError: cublasSgemm failed (14) an internal operation failed
 unit=0 N=0, c.dims=[512 2048], a.dim=[512 493], alpha=%f, beta=%f, a=%p, b=%p, c=%p sa_0=%d, sa_1=%d, sb_0=%d, sb_1=%d, sc_0=%d, sc_1=%d
Apply node that caused the error: GpuDot22(GpuReshape{2}.0, GpuReshape{2}.0)
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(512, 493), (493, 2048)]
Inputs strides: [(493, 1), (2048, 1)]
Inputs values: ['not shown', 'not shown']

由于我的代码运行良好一段时间（我进行神经网络训练，并且它大部分时间都在运行，即使发生此错误，它已经在＆gt; 2000迷你批次中运行良好），我想知道原因。也许有些硬件故障？

这是CUDA 6.0和最近的Theano（昨天来自Git），Ubuntu 12.04，GTX 580。

我在K20上也遇到了CUDA 6.5的错误：

RuntimeError: cublasSgemm failed (14) an internal operation failed
 unit=0 N=0, c.dims=[2899 2000], a.dim=[2899 493], alpha=%f, beta=%f, a=%p, b=%p, c=%p sa_0=%d, sa_1=%d, sb_0=%d, sb_1=%d, sc_0=%d, sc_1=%d
Apply node that caused the error: GpuDot22(GpuReshape{2}.0, GpuReshape{2}.0)
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(2899, 493), (493, 2000)]
Inputs strides: [(493, 1), (2000, 1)]
Inputs values: ['not shown', 'not shown']

（我过去有时会遇到的另一个错误是this而不是。不确定这是否相关。）

通过Markus，他们得到了同样的错误：

RuntimeError: cublasSgemm failed (14) an internal operation failed
 unit=0 N=0, c.dims=[2 100], a.dim=[2 9919], alpha=%f, beta=%f, a=%p, b=%p, c=%p sa_0=%d, sa_1=%d, sb_0=%d, sb_1=%d, sc_0=%d, sc_1=%d
Apply node that caused the error: GpuDot22(GpuFlatten{2}.0, weight_hidden_)
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(2, 9919), (9919, 100)]
Inputs strides: [(9919, 1), (100, 1)]
Inputs values: ['not shown', 'not shown']

使用CUDA 6.5，Windows 8.1，Python 2.7，GTX 970M。

错误只发生在我自己的网络中，如果我从Theano运行LeNet示例，它运行正常。虽然网络正在编译并且在CPU上运行良好（并且对于使用Linux的一些同事也在GPU上运行）。有谁知道问题可能是什么？

Answer 1

仅供参考以防万一有人偶然发现：

这对我来说不再发生。我不确定是什么修复它，但我认为主要的区别是我避免任何多线程和分叉（没有exec）。这引起了许多类似的问题，例如Theano CUDA error: an illegal memory access was encountered (StackOverflow)和Theano CUDA error: an illegal memory access was encountered (Google Groups discussion)。 ESP。关于Google网上论坛的讨论非常有用。

Theano功能不是多线程安全的。但是，这不是一个对我来说问题是因为我只在一个线程中使用它。但是，我仍然认为其他线程可能会导致这些问题。也许是与Python的GC有关，它在一些中释放了一些Cuda_Ndarray 在theano.function运行时的其他线程。

我看了relevant Theano code并且不确定它是否涵盖了所有这些情况。

请注意，您甚至可能没有意识到自己有一些背景知识线程。一些Python stdlib代码可以产生这样的后台线程。例如。 multiprocessing.Queue会做到这一点。

我无法避免多重线程，直到在Theano中修复，我创建了一个新的子进程用一个单独的线程来完成Theano的所有工作。这也有几个优点如：更清晰的代码分离，正在在某些情况下更快，因为它真的并行运行，并且存在能够使用多个GPU。

请注意，仅使用多处理模块对我来说不起作用这很好，因为有一些库（Numpy和其他人，也许 Theano本身）在分叉过程中可能表现不好（取决于关于版本，操作系统和竞争条件）。因此，我需要一个真实的 subprocess（fork + exec，而不仅仅是fork）。

我的代码是here，以防有人对此感兴趣。

ExecingProcess是在multiprocessing.Process之后建模的但是做一个fork + exec。（顺便说一句，在Windows上，多处理模块无论如何会这样做，因为Windows上没有分叉。）还有AsyncTask，它可以为此添加双工管道同时使用ExecingProcess和标准的multiprocessing.Process。

另请参阅：Theano Wiki: Using multiple GPUs

Answer 2

进入类似的问题，并且fwiw，在我的情况下，它通过消除导入另一个使用pycuda的库来解决。看来theano真的不喜欢分享。

Theano：cublasSgemm失败（14）内部操作失败

2 个答案: