Question

我试图阻止这篇文章： http://ronan.collobert.com/pub/matos/2008_deep_icml.pdf 具体而言，第2节中的等式（3）。

很快我想对每个小批量的功能进行成对距离计算，并将此损失插入到一般网络丢失中。我只有批次的Tesnor（16个样本），批次的标签张量和批量特征Tensor。

在寻找了一段时间后，我仍然无法弄清楚以下内容：

1）如何将批次分为正（即相同标签）和负对。由于Tensor不是迭代的，我无法弄清楚如何获得哪个样本具有哪个标签然后除以我的向量，或者得到张量的哪个索引属于每个类。

2）如何对批量张量中的某些指数进行成对距离计算？

3）我还需要为负面例子定义一个新的距离函数

总的来说，我需要得到哪个指数属于哪个类，对所有正对进行正向成对方式计算。并对所有负对进行另一次计算。然后将其全部加起来并将其添加到网络丢失中。

任何帮助（对3个问题中的一个以上）都将受到高度赞赏。

Answer 1

1）您应该在>将数据输入会话之前进行对采样。将每一对标记为布尔标签，对于匹配对，假设y = 1，否则为0。

2）3）只计算每对的pos / neg项，并让0-1标签 y 选择要添加到损失的位置。

首先创建占位符，y_用于布尔标签。

dim = 64 x1_ = tf.placeholder('float32', shape=(None, dim)) x2_ = tf.placeholder('float32', shape=(None, dim)) y_ = tf.placeholder('uint8', shape=[None]) # uint8 for boolean

然后可以通过函数创建损失张量。

def loss(x1, x2, y): # Euclidean distance between x1,x2 l2diff = tf.sqrt( tf.reduce_sum(tf.square(tf.sub(x1, x2)), reduction_indices=1)) # you can try margin parameters margin = tf.constant(1.) labels = tf.to_float(y) match_loss = tf.square(l2diff, 'match_term') mismatch_loss = tf.maximum(0., tf.sub(margin, tf.square(l2diff)), 'mismatch_term') # if label is 1, only match_loss will count, otherwise mismatch_loss loss = tf.add(tf.mul(labels, match_loss), \ tf.mul((1 - labels), mismatch_loss), 'loss_add') loss_mean = tf.reduce_mean(loss) return loss_mean loss_ = loss(x1_, x2_, y_)

然后提供您的数据（例如随机生成）：

batchsize = 4 x1 = np.random.rand(batchsize, dim) x2 = np.random.rand(batchsize, dim) y = np.array([0,1,1,0]) l = sess.run(loss_, feed_dict={x1_:x1, x2_:x2, y_:y})

Answer 2

简短回答

我认为最简单的方法是离线采样（即在TensorFlow图表之外）。
您为一批对及其标签（正面或负面，即相同类或不同类）创建tf.placeholder，然后您可以在TensorFlow中计算相应的损失。

使用代码

离线采样对。您对batch_size对输入进行采样，并输出形状batch_size对的[batch_size, input_size]左元素。您还可以输出形状[batch_size,]

pairs_left = np.zeros((batch_size, input_size))
pairs_right = np.zeros((batch_size, input_size))
labels = np.zeros((batch_size, 1))  # ex: [[0.], [1.], [1.], [0.]] for batch_size=4

然后创建与这些输入对应的Tensorflow占位符。在您的代码中，您将在feed_dict

sess.run()

pairs_left_node = tf.placeholder(tf.float32, [batch_size, input_size])
pairs_right_node = tf.placeholder(tf.float32, [batch_size, input_size])
labels_node = tf.placeholder(tf.float32, [batch_size, 1])

现在我们可以对输入执行前馈（假设您的模型是线性模型）。

W = ...   # shape [input_size, feature_size]
output_left = tf.matmul(pairs_left_node, W)  # shape [batch_size, feature_size]
output_right = tf.matmul(pairs_right_node, W)  # shape [batch_size, feature_size]

最后我们可以计算成对损失。

l2_loss_pairs = tf.reduce_sum(tf.square(output_left - output_right), 1)
positive_loss = l2_loss_pairs
negative_loss = tf.nn.relu(margin - l2_loss_pairs)
final_loss = tf.mul(labels_node, positive_loss) + tf.mul(1. - labels_node, negative_loss)

就是这样！现在，您可以通过良好的离线采样来优化此损失。

使用TensorFlow进行成对距离计算

2 个答案:

简短回答

使用代码