我有X(n×d),Y(m×d)和正定L(d×d)。我想计算D,其中D_ij是(X_i - Y_i)* L *(X_i - Y_i).T。 n和m大约是250; d大约是10 ^ 4。
我可以使用scipy.spatial.distance.cdist
,但这非常慢。
scipy.spatial.distance.cdist(X, Y, metric='mahalanobis', VI=L)
看看Dougal对this question的回答,我试过了
diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
D = np.einsum('jik,kl,jil->ij', diff, L, diff)
这也很慢。
有没有更有效的方法来矢量化这个计算?
答案 0 :(得分:1)
使用np.tensordot
和np.einsum
的组合有助于这些情况 -
np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
运行时测试 -
In [26]: n,m,d = 30,40,50
...: X = np.random.rand(n,d)
...: L = np.random.rand(d,d)
...: Y = np.random.rand(m,d)
...:
In [27]: diff = X[np.newaxis, :, :] - Y[:, np.newaxis, :]
In [28]: %timeit np.einsum('jik,kl,jil->ij', diff, L, diff)
100 loops, best of 3: 7.81 ms per loop
In [29]: %timeit np.einsum('jil,jil->ij',np.tensordot(diff, L, axes=(2,0)), diff)
1000 loops, best of 3: 472 µs per loop