恢复检查点以重新训练新班级

时间:2016-09-23 13:57:27

标签: tensorflow

我有一个训练有11个班级的检查站。我在我的数据集中添加了一个类并试图恢复它以保留CNN,但是它给了我一个与形状相关的错误,因为前一个类是用11个类训练的,实际上有12个类,我是否保存了权重和偏差变量以正确的方式?我该怎么办 ?这是代码:

batch_size = 10
num_hidden = 64
num_channels = 1
depth = 32
....
graph = tf.Graph()

with graph.as_default():

# Input data.
  tf_train_dataset = tf.placeholder(
  tf.float32, shape=(batch_size, IMAGE_SIZE_H, IMAGE_SIZE_W, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  w_b = {
     'weight_0': tf.Variable(tf.random_normal([patch_size_1, patch_size_1, num_channels, depth],stddev=0.1)),
     'weight_1': tf.Variable(tf.random_normal([patch_size_2, patch_size_2, depth, depth], stddev=0.1)),
     'weight_2': tf.Variable(tf.random_normal([patch_size_3, patch_size_3, depth, depth], stddev=0.1)),
     'weight_3': tf.Variable(tf.random_normal([IMAGE_SIZE_H // 32 * IMAGE_SIZE_W // 32 * depth, num_hidden], stddev=0.1)),
     'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),

     'bias_0' : tf.Variable(tf.zeros([depth])), 
     'bias_1' : tf.Variable(tf.constant(1.0, shape=[depth])),
     'bias_2' : tf.Variable(tf.constant(1.0, shape=[depth])),
     'bias_3' : tf.Variable(tf.constant(1.0, shape=[num_hidden])),
     'bias_4' : tf.Variable(tf.constant(1.0, shape=[num_labels]))
        }

 # Model.
   def model(data):

      conv_1 = tf.nn.conv2d(data, w_b['weight_0'] , [1, 2, 2, 1], padding='SAME')  
      hidden_1 = tf.nn.relu(conv_1 + w_b['bias_0'])   
      pool_1 = tf.nn.max_pool(hidden_1,ksize = [1,5,5,1], strides= [1,2,2,1],padding ='SAME' )
      conv_2 = tf.nn.conv2d(pool_1, w_b['weight_1'], [1, 2, 2, 1], padding='SAME')   
      hidden_2 = tf.nn.relu(conv_2 + w_b['bias_1'])
      conv_3 = tf.nn.conv2d(hidden_2, w_b['weight_2'], [1, 2, 2, 1], padding='SAME')
      hidden_3 = tf.nn.relu(conv_3 + w_b['bias_2'])
      pool_2 = tf.nn.max_pool(hidden_3,ksize = [1,3,3,1], strides= [1,2,2,1],padding ='SAME' )
      shape = pool_2.get_shape().as_list()
      reshape = tf.reshape(pool_2, [shape[0], shape[1] * shape[2] * shape[3]]) 
      hidden_4 = tf.nn.relu(tf.matmul(reshape, w_b['weight_3']) + w_b['bias_3'])

      return tf.matmul(hidden_4, w_b['weight_4']) + w_b['bias_4']


  # Training computation.
  logits = model(tf_train_dataset)

  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))


  optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)



  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

  init = tf.initialize_all_variables()
  w_b_saver = tf.train.Saver(var_list = w_b)

num_steps = 1001


with tf.Session(graph=graph) as sess:
 ckpt = ("/home/..../w_b_models.ckpt")
 if os.path.isfile(ckpt) :

    w_b_saver.restore(sess,ckpt)
    print("restore complete")
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval() , test_labels))
 else:
    print("Error while loading model checkpoint.")

    print('Initialized')
    sess.run(init)

    for step in range(num_steps):
      .....

    accuracy(test_prediction.eval(),test_labels, force = False ))   
    save_path_w_b = w_b_saver.save(sess, "/home/...../w_b_models.ckpt")
    print("Model saved in file: %s" % save_path_w_b)

这是错误:

 InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [64,12] rhs shape= [64,11]
 [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@Variable_4"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_4, save/restore_slice_9/_12)]]

1 个答案:

答案 0 :(得分:1)

我认为问题是你需要从w_b中删除它然后保存它然后在你正在做的时候恢复它。

删除它:

'weight_4': tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=0.1)),

然后它应该工作。主要原因是您正在更改标签数量并期望它恢复到同一个变量。作为旁注,使用tf.get_variable而不是tf.Variable更好。

更新回答:

创建一个名为

的新变量
w_b_to_save = {
 'weight_0': tf.Variable(tf.random_normal([patch_size_1, patch_size_1, num_channels, depth],stddev=0.1)),
 'weight_1': tf.Variable(tf.random_normal([patch_size_2, patch_size_2, depth, depth], stddev=0.1)),
 'weight_2': tf.Variable(tf.random_normal([patch_size_3, patch_size_3, depth, depth], stddev=0.1)),
 'weight_3': tf.Variable(tf.random_normal([IMAGE_SIZE_H // 32 * IMAGE_SIZE_W // 32 * depth, num_hidden], stddev=0.1)),


 'bias_0' : tf.Variable(tf.zeros([depth])), 
 'bias_1' : tf.Variable(tf.constant(1.0, shape=[depth])),
 'bias_2' : tf.Variable(tf.constant(1.0, shape=[depth])),
 'bias_3' : tf.Variable(tf.constant(1.0, shape=[num_hidden])),
    }

...

w_b_saver = tf.train.Saver(var_list = w_b_to_save)

现在你可以只保存你想要的那些。创建一个与上一个变量基本相同的新变量有点过分,但它表明你不能保存最后一层,并在更改时恢复它它