Question

其他Stackoverflow用户

我目前正在努力解决该问题：

我有2个2d张量：

a = Tensor(shape=[600,52]) # 600 vectors of length 52
b = Tensor(shape=[16000,52]) # 1600 vectors of length 52

我正在尝试计算所有矢量组合的余弦相似度，并将它们存储在第三个张量中。

similarity = Tensor(shape=[600, 16000])

我的问题如下

a）我不太确定如何以非迭代方式实现此目标，我考虑过将广播语义与tf.losses.cosine_distance结合使用，但我无法完全理解看起来像。

b）根据实现（如果使用tf.losses.cosine_distance，这需要两个输入张量都匹配尺寸），内存占用可能会变得非常大，因为需要创建两个形状为[600，1600的张量，52]，以便计算向量的所有组合的距离。您能解决这个问题的可能性吗？

我希望我能够以一种可以理解的方式表达自己的想法，谢谢您的帮助

最好

Answer 1

您可以像这样简单地计算：

import tensorflow as tf

# Vectors
a = tf.placeholder(tf.float32, shape=[600, 52])
b = tf.placeholder(tf.float32, shape=[16000, 52])
# Cosine similarity
similarity = tf.reduce_sum(a[:, tf.newaxis] * b, axis=-1)
# Only necessary if vectors are not normalized
similarity /= tf.norm(a[:, tf.newaxis], axis=-1) * tf.norm(b, axis=-1)
# If you prefer the distance measure
distance = 1 - similarity

在Tensorflow中计算两组向量的余弦相似度

1 个答案: