Question

我有两个系列。一个由 k 维度中的 m ₁ 点组成，另一个 m ₂ k 尺寸中的点。我需要计算两个集合中每对之间的成对距离。

基本上有两个矩阵 A _{m ₁，k} 和 B _{m ₂ ，k} 我需要得到一个矩阵 C _{m ₁，m ₂}

我可以通过使用distance.sdist轻松地在scipy中执行此操作并选择多个距离指标中的一个，我也可以在循环中在TF中执行此操作，但我无法弄清楚如何执行此操作甚至对于Eucledian距离进行矩阵操作。

Answer 1

几个小时后，我终于在Tensorflow中找到了如何做到这一点。我的解决方案仅适用于Eucledian距离并且非常冗长。我也没有数学证明（只是很多手工操作，我希望它更加严谨）：

import tensorflow as tf
import numpy as np
from scipy.spatial.distance import cdist

M1, M2, K = 3, 4, 2

# Scipy calculation
a = np.random.rand(M1, K).astype(np.float32)
b = np.random.rand(M2, K).astype(np.float32)
print cdist(a, b, 'euclidean'), '\n'

# TF calculation
A = tf.Variable(a)
B = tf.Variable(b)

p1 = tf.matmul(
    tf.expand_dims(tf.reduce_sum(tf.square(A), 1), 1),
    tf.ones(shape=(1, M2))
)
p2 = tf.transpose(tf.matmul(
    tf.reshape(tf.reduce_sum(tf.square(B), 1), shape=[-1, 1]),
    tf.ones(shape=(M1, 1)),
    transpose_b=True
))

res = tf.sqrt(tf.add(p1, p2) - 2 * tf.matmul(A, B, transpose_b=True))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(res)

Answer 2

这将对任意维的张量（即包含（...，N，d）向量）进行处理。请注意，它不在集合之间（即不像scipy.spatial.distance.cdist），而是在单个向量（例如scipy.spatial.distance.pdist）内

import tensorflow as tf
import string

def pdist(arr):
    """Pairwise Euclidean distances between vectors contained at the back of tensors.

    Uses expansion: (x - y)^T (x - y) = x^Tx - 2x^Ty + y^Ty 

    :param arr: (..., N, d) tensor
    :returns: (..., N, N) tensor of pairwise distances between vectors in the second-to-last dim.
    :rtype: tf.Tensor

    """
    shape = tuple(arr.get_shape().as_list())
    rank_ = len(shape)
    N, d = shape[-2:]

    # Build a prefix from the array without the indices we'll use later.
    pref = string.ascii_lowercase[:rank_ - 2]

    # Outer product of points (..., N, N)
    xxT = tf.einsum('{0}ni,{0}mi->{0}nm'.format(pref), arr, arr)

    # Inner product of points. (..., N)
    xTx = tf.einsum('{0}ni,{0}ni->{0}n'.format(pref), arr, arr)

    # (..., N, N) inner products tiled.
    xTx_tile = tf.tile(xTx[..., None], (1,) * (rank_ - 1) + (N,))

    # Build the permuter. (sigh, no tf.swapaxes yet)
    permute = list(range(rank_))
    permute[-2], permute[-1] = permute[-1], permute[-2]

    # dists = (x^Tx - 2x^Ty + y^Tx)^(1/2). Note the axis swapping is necessary to 'pair' x^Tx and y^Ty
    return tf.sqrt(xTx_tile - 2 * xxT + tf.transpose(xTx_tile, permute))

计算TensorFlow

2 个答案: