使用slim tensorflow重新训练初始v1:向协调器报告错误

时间:2018-06-02 11:40:11

标签: python tensorflow

请注意,这是我第一次使用tensorflow并实现深度学习。我试图根据slim_walkthrough笔记本的教程重新训练我自己的图像/标签。

当参加训练时:

with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)

images, labels = create_batch.main(image_size, image_size,  tf_train, num_epochs, batch_size,capacity, n_threats, min_dq)

# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
    logits, _ = inception.inception_v1(tf.to_float(images), num_classes=5, is_training=True)

# Specify the loss function:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=5)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

one_hot_labels = tf.one_hot(labels,5)
tf.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = tf.losses.get_total_loss()

# Create some summaries to visualize the training process:
tf.summary.scalar('losses/Total Loss', total_loss)

# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Run the training:
**final_loss = slim.learning.train(
    train_op,
    logdir=train_dir,
    init_fn=get_init_fn(train_dir),
    number_of_steps=2)**

我收到此错误:

INFO:tensorflow:Restoring parameters from ../models/inception/inception_v1.ckpt
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path ../models/inception/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Input to reshape is a tensor with 75000 values, but the requested shape has 225000
     [[Node: Reshape = Reshape[T=DT_UINT8, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](DecodeRaw, stack)]] 
INFO:tensorflow:Caught OutOfRangeError. Stopping Training. RandomShuffleQueue '_6_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 32, current size 0)
     [[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_UINT8, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

我不明白为什么张量的形状有问题。在创建tfrecords的批次时,image_size来自inception.inception_v1.default_image_size。

如果有帮助,这就是变量的样子:

  • train_op = Tensor(“train_op / control_dependency:0”,shape =(), D型= FLOAT32)
  • total_loss = Tensor(“total_loss:0”,shape =(),dtype = float32)

  • optimizer = tensorflow.python.training.adam.AdamOptimizer对象at 0x0000028BE9A05978

  • logits = Tensor(“InceptionV1 / Logits / SpatialSqueeze:0”,shape =(32,5), D型= FLOAT32)

  • one_hot_labels = Tensor(“one_hot_1:0”,shape =(32,5),dtype = float32)
  • tensorflow版本 - &gt; 1.8.0

感谢您的帮助

0 个答案:

没有答案