我知道如何在单个GPU上训练网络 - >保存检查点 - >稍后加载此检查点 - >运行基准。
当我使用多个GPU进行训练并使用新的Data API时,我无法确定如何操作。
这是正常的'培训代码:
import tensorflow as tf
images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size,
image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
embeddings = build_graph(images_placeholder)
loss = add_loss(embeddings, labels_placeholder)
embeddings = tf.identity(embeddings, 'embeddings')
稍后,当我想要进行基准测试时:
with tf.Graph().as_default():
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
load_graph_def(model_path) # for example: d:\model.ckpt-0
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
images = benchmark_utils.load_data(paths_batch, image_size)
feed_dict = {images_placeholder: images}
predictions = sess.run(embeddings, feed_dict=feed_dict)
所以现在我想用多个GPU进行训练:
with tf.Graph().as_default(), tf.device('/cpu:0'):
dataset = tf.data.Dataset.from_tensor_slices((images_list, labels_list))
dataset = dataset.map(load_images)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(128)
dataset = dataset.repeat()
opt = tf.train.MomentumOptimizer(0.01, momentum=0.9, use_nesterov=True)
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in range(num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
image_batch, label_batch = dataset.iterator.get_next()
loss = tower_loss(scope, image_batch, label_batch)
我能弄清楚的是如何才能获得输入'和嵌入式'当我想对检查点进行基准测试时,会出现张量。
我如何定义例如称为“'输入”的张量?应该收到应该评估的图像吗?
我猜测多gpu代码中的某个地方,我应该像我在单一gpu培训中定义的那样定义images_placeholder
。
感谢您的任何建议!