我正在运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/
我从这里下载了代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/
我在ubuntu 14.04上的AWS中的g2.4xlarge机器上运行代码。单个gpu示例运行正常,没有任何错误。
有人可以帮忙解决这个问题吗?我正在运行0.12版本。
ubuntu @ ip-xxx-xx-xx-xx:〜/ pythonworkspace / tensorflowdev / models-master / tutorials / image / cifar10 $ python -c'import tensorflow as tf;打印(TF。的版本)'
ubuntu @ ip-xxx-xx-xx-xx:〜/ pythonworkspace / tensorflowdev / models-master / tutorials / image / cifar10 $ python cifar10_multi_gpu_train.py --num_gpus = 2
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 210, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
colocate_with_primary=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
use_resource=_is_resource(primary))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
答案 0 :(得分:2)
您可以在此处找到问题的答案: Issue 6220
你需要把:
with tf.variable_scope(tf.get_variable_scope())
在你的设备上运行的循环前面......
所以,那样做:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
链接中给出了解释......
这里引用:
当你执行tf.get_variable_scope()。reuse_variables()时,你设置了 当前重用变量的范围。如果你调用优化器 它试图重用槽找不到的槽变量,所以 它会抛出一个错误。如果你放置一个范围,那么 tf.get_variable_scope()。reuse_variables()只影响该范围,所以 当你退出它时,你又回到了非重用模式,即你 想。
希望有所帮助,请告诉我是否应该澄清更多。