从具有多GPU的检查点进行模型评估

时间:2017-12-12 11:28:25

标签: tensorflow multi-gpu

我知道如何在单个GPU上训练网络 - >保存检查点 - >稍后加载此检查点 - >运行基准。

当我使用多个GPU进行训练并使用新的Data API时,我无法确定如何操作。

这是正常的'培训代码:

import tensorflow as tf


images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size, 
image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))

embeddings = build_graph(images_placeholder)
loss = add_loss(embeddings, labels_placeholder)

embeddings = tf.identity(embeddings, 'embeddings')

稍后,当我想要进行基准测试时:

with tf.Graph().as_default():

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:

    load_graph_def(model_path)  # for example: d:\model.ckpt-0

    images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
    embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")

    images = benchmark_utils.load_data(paths_batch, image_size)
    feed_dict = {images_placeholder: images}
    predictions = sess.run(embeddings, feed_dict=feed_dict)

所以现在我想用多个GPU进行训练:

with tf.Graph().as_default(), tf.device('/cpu:0'):

dataset = tf.data.Dataset.from_tensor_slices((images_list, labels_list))
dataset = dataset.map(load_images)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(128)
dataset = dataset.repeat()

opt = tf.train.MomentumOptimizer(0.01, momentum=0.9, use_nesterov=True)

tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
    for i in range(num_gpus):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
                image_batch, label_batch = dataset.iterator.get_next()
                loss = tower_loss(scope, image_batch, label_batch)

我能弄清楚的是如何才能获得输入'和嵌入式'当我想对检查点进行基准测试时,会出现张量。

我如何定义例如称为“'输入”的张量?应该收到应该评估的图像吗?

我猜测多gpu代码中的某个地方,我应该像我在单一gpu培训中定义的那样定义images_placeholder

感谢您的任何建议!

0 个答案:

没有答案