原始工具是正确的,仅包含loss1和loss2,所以我认为我的输入数据是正确的。
但是,在我添加了名为“ sw_loss”的loss3之后,训练损失始终是“ nan”,其目的是最小化“功能”行之间的L2范数。 “功能”是网络最后一层的输出。
实际上,训练损失在第一个时期的第二批中变为“ nan”,而第一批次的损失约为2.2。
主要代码如下:
features, _ = mnist_net(images)
centers = func.construct_center(features, FLAGS.num_classes)
loss1 = func.dce_loss(features, labels, centers, FLAGS.temp)
loss2 = func.pl_loss(features, labels, centers)
loss3 = func.sw_loss(features, similarity_weight_batch) #loss3 is defined in the following
loss = loss1 + FLAGS.weight_pl * loss2 + FLAGS.weight_sw * loss3
eval_correct = func.evaluation(features, labels, centers)
train_op = func.training(loss, lr)
init = tf.global_variables_initializer()
# initialize the variables
sess = tf.Session()
sess.run(init)
#compute_centers(sess, add_op, count_op, average_op, images, labels, train_x, train_y)
# run the computation graph (train and test process)
epoch = 1
loss_before = np.inf
score_before = 0.0
stopping = 0
index = list(range(train_num))
np.random.shuffle(index)
batch_size = FLAGS.batch_size
batch_num = train_num//batch_size if train_num % batch_size==0 else train_num//batch_size+1
train_start= time.time()
while stopping<FLAGS.stop:
time1 = time.time()
loss_now = 0.0
score_now = 0.0
for i in range(batch_num):
batch_x = train_x[index[i*batch_size:(i+1)*batch_size]]
batch_y = train_y[index[i*batch_size:(i+1)*batch_size]]
batch_index = np.asarray( index[i*batch_size:(i+1)*batch_size])
weight_batch = np.zeros(shape=(batch_index.shape[0],batch_index.shape[0]))
for j in range(batch_index.shape[0]):
for k in range(batch_index.shape[0]):
weight_batch[j,k] = similarity_weight[[batch_index[j,]],[batch_index[k,]]]
result = sess.run([train_op, loss, eval_correct], feed_dict={images:batch_x,
labels:batch_y, lr:FLAGS.learning_rate, similarity_weight_batch:weight_batch})
loss_now += result[1]
score_now += result[2][1]
score_now /= train_num
sw_loss在功能文件中定义如下:。
def sw_loss(features, similarity_weight_batch): #'similarity_weight_batch' is the coefficients,which is between(0,1].
sw_loss_total = 0.0
sqdiff = tf.squared_difference(features[:, tf.newaxis], features)
feature_matrix = tf.sqrt(tf.reduce_sum(sqdiff, axis=-1)) #calculate the L2 norm between the rows of the features
sw_loss_total = tf.multiply(similarity_weight_batch,feature_matrix)
return tf.reduce_mean(sw_loss_total)
打印的日志如下,在所有时期训练损失都是'nan':
epoch 1: training: loss --> nan, acc --> 15.514%
time for this epoch: 0.074 minutes
epoch 2: training: loss --> nan, acc --> 15.514%
time for this epoch: 0.024 minutes
epoch 3: training: loss --> nan, acc --> 15.514%
time for this epoch: 0.073 minutes
epoch 4: training: loss --> nan, acc --> 15.514%
time for this epoch: 0.033 minutes
epoch 5: training: loss --> nan, acc --> 15.514%
time for this epoch: 0.021 minutes
...