TensorFlow GPU - cudnn错误(中止 - 核心转储)

时间:2016-08-06 22:04:46

标签: tensorflow cudnn

不确定我是否应该在此处或在TensorFlow github页面上提出此问题。 无论哪种方式:

尝试在GPU机器上使用TensorFlow。安装成功,或者我想。跑一些较小的神经网络很好。但是当我尝试使用预训练的VGGNet(16层)时,它给我留下了以下痕迹:

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 660
major: 3 minor: 0 memoryClockRate (GHz) 1.0715
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.93GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating        TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus id: 0000:01:00.0)

它适当地构建图形,当涉及到运行会话时,突然停止:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT
E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x30fd660: CUDA_ERROR_LAUNCH_TIMEOUT
E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x30fd660: CUDA_ERROR_LAUNCH_TIMEOUT
F tensorflow/stream_executor/cuda/cuda_dnn.cc:1251] failed to enqueue convolution on stream: CUDNN_STATUS_MAPPING_ERROR
Aborted (core dumped)

为什么会发生这种情况?我认为这可能是因为cudnn版本,但我按照指示安装了v4,因为我还没有从源代码构建TensorFlow。

更新: 如果这有任何帮助,我可以运行较小的网络,内存不足警告。我很确定它在GPU上运行,主要是通过以下方式:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating     TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus     id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to   allocate 2.00G (2146762752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 1.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

0 个答案:

没有答案