我无法将模型加载到恢复培训。 我在cifar数据集上使用简单的双层NN(完全连接)进行练习。
#full_connected_layers
import tensorflow as tf
import numpy as np
#input _-> hidden ->
def inference(data_samples, image_pixels, hidden_units, classes, reg_constant):
with tf.variable_scope('Layer1'):
# Define the variables
weights = tf.get_variable(
name='weights',
shape=[image_pixels, hidden_units],
initializer=tf.truncated_normal_initializer(
stddev=1.0 / np.sqrt(float(image_pixels))),
regularizer=tf.contrib.layers.l2_regularizer(reg_constant)
)
biases = tf.Variable(tf.zeros([hidden_units]), name='biases')
# Define the layer's output
hidden = tf.nn.relu(tf.matmul(data_samples, weights) + biases)
with tf.variable_scope('Layer2'):
# Define variables
weights = tf.get_variable('weights', [hidden_units, classes],
initializer=tf.truncated_normal_initializer(
stddev=1.0 / np.sqrt(float(hidden_units))),
regularizer=tf.contrib.layers.l2_regularizer(reg_constant))
biases = tf.Variable(tf.zeros([classes]), name='biases')
# Define the layer's output
logits = tf.matmul(hidden, weights) + biases
# Define summery-operation for 'logits'-variable
tf.summary.histogram('logits', logits)
return logits
def loss(logits, labels):
'''Calculates the loss from logits and labels.
Args:
logits: Logits tensor, float - [batch size, number of classes].
labels: Labels tensor, int64 - [batch size].
Returns:
loss: Loss tensor of type float.
'''
with tf.name_scope('Loss'):
# Operation to determine the cross entropy between logits and labels
cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=labels, name='cross_entropy'))
# Operation for the loss function
loss = cross_entropy + tf.add_n(tf.get_collection(
tf.GraphKeys.REGULARIZATION_LOSSES))
# Add a scalar summary for the loss
tf.summary.scalar('loss', loss)
return loss
def training(loss, learning_rate):
# Create a variable to track the global step
global_step = tf.Variable(0, name='global_step', trainable=False)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(
loss, global_step=global_step)
#train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(
#loss, global_step=global_step)
return train_step
def evaluation(logits, labels):
with tf.name_scope('Accuracy'):
# Operation comparing prediction with true label
correct_prediction = tf.equal(tf.argmax(logits,1), labels)
# Operation calculating the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Summary operation for the accuracy
tf.summary.scalar('train_accuracy', accuracy)
return accuracy
if (i + 1) % 500 == 0:
saver.save(sess, MODEL_DIR, global_step=i)
print('Saved checkpoint')
在此目录中:
C:\Users\Moondra\Desktop\CIFAR - PROJECT\parameters_no_changes
我有以下文件以及model.ckpt-499.index
等:
model.ckpt-999.meta
model.ckpt-999.index
model.ckpt-999.data-00000-of-00001
import numpy as np
import tensorflow as tf
import time
from datetime import datetime
import os
import data_helpers
import full_connected_layers
import itertools
learning_rate = .0001
max_steps = 3000
batch_size = 400
checkpoint = r'C:\Users\Moondra\Desktop\CIFAR - PROJECT\parameters_no_changes\model.ckpt-999'
with tf.Session() as sess:
saver = tf.train.import_meta_graph(r'C:\Users\Moondra\Desktop\CIFAR - PROJECT' +
'\\parameters_no_changes\model.ckpt-999.meta')
saver.restore(sess, checkpoint)
data_sets = data_helpers.load_data()
images = tf.get_default_graph().get_tensor_by_name('images:0') #image placeholder
labels = tf.get_default_graph().get_tensor_by_name('image-labels:0') #placeholder
loss = tf.get_default_graph().get_tensor_by_name('Loss/add:0')
#global_step = tf.get_default_graph().get_tensor_by_name('global_step/initial_value_1:0')
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(
loss)
accuracy = tf.get_default_graph().get_tensor_by_name('Accuracy/Mean:0')
with tf.Session() as sess:
#sess.run(tf.global_variables_initializer())
zipped_data = zip(data_sets['images_train'], data_sets['labels_train'])
batches = data_helpers.gen_batch(list(zipped_data), batch_size,
max_steps)
for i in range(max_steps):
# Get next input data batch
batch = next(batches)
images_batch, labels_batch = zip(*batch)
feed_dict = {
images: images_batch,
labels: labels_batch
}
if i % 100 == 0:
train_accuracy = sess.run(accuracy, feed_dict=feed_dict)
print('Step {:d}, training accuracy {:g}'.format(i, train_accuracy))
ts,loss_ =sess.run([train_step, loss], feed_dict=feed_dict)
1)我应该使用此命令latest_checkpoint
来恢复:
`
saver.restore(sess,tf.train.latest_checkpoint('./'))`
我看到一些教程只指向持有的文件夹 .data,.index文件。
2)这让我想到了第二个问题:我应该使用什么作为saver.restore
的第二个参数。
目前我只是指向包含这些文件的文件夹/目录
3)我没有故意初始化任何变量,因为我被告知,这将覆盖存储的权重和偏差值。这似乎导致了这个错误:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value Layer1/weights
[[Node: Layer1/weights/read = Identity[T=DT_FLOAT, _class=["loc:@Layer1/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"](Layer1/weights)]]
4)但是,如果我通过此代码初始化所有变量:
sess.run(tf.global_variables_initializer())
我的模型似乎从头开始训练(而不是恢复训练)
这是否意味着我应该通过加载所有权重和偏差
get_tensor
明确?如果是这样,我如何处理20多层的图层?
5)当我运行此命令时
for i in tf.get_default_graph().get_operations():
print(i.values)
我看到很多global_steps张量/操作,
'global_step/initial_value' type=Const>>
'global_step' type=VariableV2>>
<'global_step/Assign' type=Assign>>
global_step/read' type=Identity>>
我试图将此变量加载到我当前的图表中,但是
我不知道我应该get
使用哪个命令
get_tensor_by_name。其中大多数都导致了一个不存在的错误。
6)与loss
相同,我应该使用get_tensor
以下是选项:
<bound method Operation.values of <tf.Operation 'Loss/Const' type=Const>>
<bound method Operation.values of <tf.Operation 'Loss/Mean' type=Mean>>
<bound method Operation.values of <tf.Operation 'Loss/AddN' type=AddN>>
<bound method Operation.values of <tf.Operation 'Loss/add' type=Add>>
<bound method Operation.values of <tf.Operation 'Loss/loss/tags' type=Const>>
<bound method Operation.values of <tf.Operation 'Loss/loss' type=ScalarSummary>>
6)最后,当我看到所有内容时,我会看到很多渐变操作
图的节点,但我没有看到任何与train_step
相关的节点(
我创建的python变量指向Gradient Dsecent Optimizer)。这是否意味着我不需要通过get_tensor
将其加载到此图表中?
谢谢。
答案 0 :(得分:5)
我经常做这个操作序列:
初始化
恢复
这转换为这种代码:
indexArray = indexArray.filter(onlyUnique);
它将避免非初始化错误,并且恢复将使用检查点的值覆盖。
答案 1 :(得分:1)
1 /在保存检查点的文件夹中,应该有一个名为“checkpoint”的文件,其中包含最新检查点的名称。
我通常会阅读此文件以找到最新的检查点。
2 /我使用checkpoint_directory / global_step 有了这个,tf将在checkpoint_directory中创建4个文件:
global_step.data-00000-的-00001
global_step.index
global_step.meta
检查点
3/4 /我很确定你不需要在加载之前预先初始化图形,至少我不这样做。
有一些区别:我每次加载时都会重建整个图形,而不是import_meta_graph,但我确定在初始化之前加载不是一个问题。
如果您想要最新的检查点,build_net()
saver = tf.train.Saver()
saver.restore(session,checkpoint_dir / global_step)
add_loss_and_optimizer()
initialize_all_uninitialized_tensor
checkpoint_dir / global_step来自检查点文件,或者您可以使用不同的global_step来获取您要加载的特定检查点。