有效地更新点之间的距离

时间:2017-03-14 18:37:35

标签: python arrays numpy scipy euclidean-distance

我的数据集有n行(观察)和p列(要素):

$("#registerForm a").click(function() 
{

    $.ajax(
    {
        url: "URL_GOES_HERE",
        data: $('#registerForm').serialize(),
        type: 'POST',
        async: false
    })
    .done(function(response) 
    {
        console.log(response);

        var result = JSON.parse(response);      
    })
})

我有兴趣在import numpy as np from scipy.spatial.distance import pdist, squareform p = 3 n = 5 xOld = np.random.rand(n * p).reshape([n, p]) 矩阵中获得真正具有nxn唯一值的这些点之间的距离

n x (n-1)/2

现在想象一下我会收到sq_dists = pdist(xOld, 'sqeuclidean') D_n = squareform(sq_dists) 个额外的观察结果,并希望更新N。一种非常低效的方式是:

D_n

然而,考虑到n~10000和N~100,这将是多余的。我的目标是使用N = 3 xNew = np.random.rand(N * p).reshape([N, p]) sq_dists = pdist(np.row_stack([xOld, xNew]), 'sqeuclidean') D_n_N = squareform(sq_dists) 更有效地D_n_N。为了做到这一点,我按如下方式划分D_n_N。我已经D_n并且可以计算D_n。但是,我想知道是否有一种很好的方法来计算A(或A转置)而没有一堆for循环并最终构造B [N x N]

D_n_N

提前致谢。

2 个答案:

答案 0 :(得分:2)

非常有趣的问题!那么我在学习解决方案的过程中学到了很多新东西。

涉及的步骤:

  • 首先,我们在这里介绍新人。因此,我们需要使用cdist来获得新旧点之间的欧氏距离平方。这些将被容纳在新输出中的两个区块中,一个位于旧距离的下方,一个位于旧距离的右侧。

  • 我们还需要计算新pts中的pdist并将其square-formed块放在新对角线区域的尾部。

示意地将新D_n_N看起来像这样:

[   D_n      cdist.T
  cdist      New pdist squarefomed]

总结一下,实现将沿着这些方向发展 -

cdists = cdist( xNew, xOld, 'sqeuclidean')

n1 = D_n.shape[0]
out = np.empty((n1+N,n1+N))
out[:n1,:n1] = D_n
out[n1:,:n1] = cdists
out[:n1,n1:] = cdists.T
out[n1:,n1:] = squareform(pdist(xNew, 'sqeuclidean'))

运行时测试

方法 -

# Original approach
def org_app(D_n, xNew):
    sq_dists = pdist(np.row_stack([xOld, xNew]), 'sqeuclidean')
    D_n_N = squareform(sq_dists)
    return D_n_N    

# Proposed approach
def proposed_app(D_n, xNew, N):
    cdists = cdist( xNew, xOld, 'sqeuclidean')    
    n1 = D_n.shape[0]
    out = np.empty((n1+N,n1+N))
    out[:n1,:n1] = D_n
    out[n1:,:n1] = cdists
    out[:n1,n1:] = cdists.T
    out[n1:,n1:] = squareform(pdist(xNew, 'sqeuclidean'))
    return out

计时 -

In [102]: # Setup inputs
     ...: p = 3
     ...: n = 5000
     ...: xOld = np.random.rand(n * p).reshape([n, p])
     ...: 
     ...: sq_dists = pdist(xOld, 'sqeuclidean')
     ...: D_n = squareform(sq_dists)
     ...: 
     ...: N = 3000
     ...: xNew = np.random.rand(N * p).reshape([N, p])
     ...: 

In [103]: np.allclose( proposed_app(D_n, xNew, N), org_app(D_n, xNew))
Out[103]: True

In [104]: %timeit org_app(D_n, xNew)
1 loops, best of 3: 541 ms per loop

In [105]: %timeit proposed_app(D_n, xNew, N)
1 loops, best of 3: 201 ms per loop

答案 1 :(得分:1)

只需使用cdist:

D_OO=cdist(xOld,xOld)

D_NN=cdist(xNew,xNew)
D_NO=cdist(xNew,xOld)
D_ON=cdist(xOld,xNew) # or D_NO.T

最后:

D_=vstack((hstack((D_OO,D_ON)),(hstack((D_NO,D_NN)))))