Question

我正在尝试运行一个对于GPU来说太大的网络（卷积，高速公路，fc，rnn）。因此，我在全球范围内将设备定义为＆＃34; cpu＆＃34;。仍然在执行脚本时，在构建模型之后，在初始化变量时，脚本会抛出一个gpu错误。

  with tf.Session() as sess:
    with tf.device("cpu:0"):
      model = CNN_FC_LANGUAGE(sess, checkpoint_dir=FLAGS.checkpoint_dir,
                                      char_embed_dim=FLAGS.char_embed_dim,
                                      summaries_dir=FLAGS.summaries_dir,
                                      feature_maps=eval(FLAGS.feature_maps),
                                      kernels=eval(FLAGS.kernels),
                                      batch_size=FLAGS.batch_size,
                                      dropout_prob=FLAGS.dropout_prob,
                                      forward_only=FLAGS.forward_only,
                                      seq_length=FLAGS.seq_length,
                                      prediction_starts=FLAGS.prediction_starts,
                                      prediction_length=FLAGS.prediction_length,
                                      use_char=FLAGS.use_char,
                                      highway_layers=FLAGS.highway_layers,
                                      rnn_size=FLAGS.rnn_size,
                                      rnn_layer_depth=FLAGS.rnn_layer_depth,
                                      use_batch_norm=FLAGS.use_batch_norm,
                                      run_name=run_name,
                                      data_dir=FLAGS.data_dir)

      model.run(FLAGS.epoch, FLAGS.learning_rate, FLAGS.learning_rate_decay, FLAGS.net2net)

在所有使用过的脚本中搜索＆＃34; gpu＆＃34;确实给出0结果。此外，在创建模型时，我会打印所有张量名称。该设备也被打印出来。搜索＆＃34; gpu＆＃34;在这里，我也得到0个结果。

但是，当脚本运行时，它会引发CUDA错误。但是，如果设备显式设置为CPU，为什么会在GPU上分配任何内存？

E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 2147483648
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1932735232
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1739461632
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1565515520 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1565515520
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 1408964096 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1408964096
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 4294967296
Killed

有什么想法吗？ THX

编辑：同样在构建图形tensorflow echos时：

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1050 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.468
pciBusID 0000:04:00.0
Total memory: 3.94GiB
Free memory: 3.64GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:04:00.0)

但为什么呢？我告诉它只使用cpu，对吧？

Answer 1

固定内存通过调用 cudaMallocHost 进行分配。这种方法不分配全局GPU内存，内存在主机端分配，但有一些属性允许通过PCI-Express更快地复制。

此外，cudaMallocHost需要连续的内存，也许你的内存被分成小的稀疏分配，cudaMalloc失败。

Answer 2

TensorFlow的GPU版本将始终尝试初始化GPU运行时（包括设备和分配器）（如果有）as X3liF observes，您看到的错误来自分配主机（即CPU）内存如果您尝试使用GPU，可以更有效地访问。

为了避免使用任何GPU资源，您可以在启动Python时设置CUDA_VISIBLE_DEVICES环境变量。假设您的代码位于名为my_script.py的文件中：

# An empty CUDA_VISIBLE_DEVICES will hide all GPUs from TensorFlow.
$ CUDA_VISIBLE_DEVICES="" python my_script.py

即使设备是cpu，也会发生gpu错误

2 个答案: