Question

我正在使用TensorFlow-Slim。我的目标是在多GPU模式下运行给定的标准脚本（位于/ models / slim / scripts中）。我已经测试了finetune_resnet_v1_50_on_flowers.sh脚本（克隆于12.04.2017）。我刚刚在培训部分的末尾添加了--num_clones = 2（灵感来自/slim/deployment/model_deploy_test.py和以前的StackOverflow答案）：

python train_image_classifier.py \
  --train_dir=${TRAIN_DIR} \
  --dataset_name=flowers \
  --dataset_split_name=train \
  --dataset_dir=${DATASET_DIR} \
  --model_name=resnet_v1_50 \
  --checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt \
  --checkpoint_exclude_scopes=resnet_v1_50/logits \
  --trainable_scopes=resnet_v1_50/logits \
  --max_number_of_steps=3000 \
  --batch_size=32 \
  --learning_rate=0.01 \
  --save_interval_secs=60 \
  --save_summaries_secs=60 \
  --log_every_n_steps=100 \
  --optimizer=rmsprop \
  --weight_decay=0.00004 \
  --num_clones=2

部署代码/ model_deploy_test.py：

def testMultiGPU(self):
    deploy_config = model_deploy.DeploymentConfig(num_clones=2)

我收到了一条警告（＆＃39;忽略设备规格＆＃39;）：

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:86:00.0)
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0

GPU正常运行（内存使用率和GPU利用率），但与单GPU培训相比，培训速度并不快。

此问题可能与：https://github.com/tensorflow/tensorflow/issues/8061

有关

我很高兴收到你对这个问题的答案，意见或具体建议。

CUDA版本：8.0版，V8.0.53

从二进制测试版本安装的TensorFlow：1.0.1和1.1.0rc

GPU：NVIDIA Tesla P100（SXM2）

Answer 1

请遵循此文件 https://github.com/tensorflow/tensorflow/issues/12689 为了确保变量存储在CPU中，我们需要使用上下文管理器与slim.arg_scope([slim.model_variable, slim.variable], device='/cpu:0'):

它解决了我的问题。

Answer 2

即使这个答案可能会迟到，培训也不应该更快（每步的秒数）。而是创建了另一个模型，导致您的参数的有效批量大小为64，因此您可以将最大步数减半。

TensorFlow-Slim多GPU培训

2 个答案: