Question

不确定我是否应该在此处或在TensorFlow github页面上提出此问题。无论哪种方式：

尝试在GPU机器上使用TensorFlow。安装成功，或者我想。跑一些较小的神经网络很好。但是当我尝试使用预训练的VGGNet（16层）时，它给我留下了以下痕迹：

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 660
major: 3 minor: 0 memoryClockRate (GHz) 1.0715
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.93GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating        TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus id: 0000:01:00.0)

它适当地构建图形，当涉及到运行会话时，突然停止：

I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT
E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x30fd660: CUDA_ERROR_LAUNCH_TIMEOUT
E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x30fd660: CUDA_ERROR_LAUNCH_TIMEOUT
F tensorflow/stream_executor/cuda/cuda_dnn.cc:1251] failed to enqueue convolution on stream: CUDNN_STATUS_MAPPING_ERROR
Aborted (core dumped)

为什么会发生这种情况？我认为这可能是因为cudnn版本，但我按照指示安装了v4，因为我还没有从源代码构建TensorFlow。

更新：如果这有任何帮助，我可以运行较小的网络，内存不足警告。我很确定它在GPU上运行，主要是通过以下方式：

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating     TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus     id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to   allocate 2.00G (2146762752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 1.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

TensorFlow GPU - cudnn错误（中止 - 核心转储）

0 个答案: