Question

更新

以防这对任何人都有用。如下所述，欧几里德距离计算和np.argmin几乎占据了所有运行时间。通过使用numba重写距离计算，与已经很快的np.einsum相比，我在大多数情况下能够减少至少20％。

@jit(nopython=True)
def calculateDistances_numba(currentLocation, traces)

    deltaX = traces[:, 0, :] - currentLocation[0]
    deltaY = traces[:, 1, :] - currentLocation[1]
    deltaZ = traces[:, 2, :] - currentLocation[2]

    distances = (deltaX**2 + deltaY**2 + deltaZ**2)*0.5

    return distances

~~~~

问题

我有一个大数组vertices.shape = (N, 3); N ~ 5e6，用于描述非结构化网格的3D顶点。我有n个较小的坐标和数据数组，我想在vertices上线性插值。它们沿另一个数组traces.shape = (L, 3, n); L ~ 2e4; n ~ 2e3的第3轴存储。对于每个顶点（vertices中的行），我希望快速找到来自不同小数组的两个最接近的点（traces中的页面，即它们沿{{的索引） 1}}是不同的）。最接近，我的意思是欧几里德距离axis=2。此函数的目的是在两个已知值之间线性插值到顶点中的点。

我当前的功能运行得相当好，但对于上面给出的预期阵列大小（8小时以上）变得非常慢。我已经完成了我的整个代码，并且可以明确地说这个计算是昂贵的。

当前功能

d = (deltaX**2 + deltaY**2 + deltaZ**2)

％timeit输出

import numpy as np

def interpolate(currentLocation, traces, nTraces):
    # Calculate the Euclidean distance between currentLocation and all points
    # in the search bracket. Einsum was found to be faster than np.linalg.norm as well as 
    # standard numpy operations.
    # Distances is a 2D array of shape (L, n) and corresponds to the Euclidean distance
    # between currentLocation and every point in traces.
    deltas = traces - currentLocation[None, :, None]
    distances = np.einsum('ijk,ijk->ik', deltas, deltas)**0.5

    # Along axis = 1 is definitely a little bit faster 
    # but haven't implemented.
    # rowIndices is a 1D array whose elements are the indices of the 
    # smallest distance for each page (small array) of traces.
    rowIndices = np.argmin(distances, axis=0)

    # Get the actual distances
    min_distances = distances[rowIndices, np.arange(nTraces)]

    # Indices of two smallest traces (pages)
    columnIndices = np.argpartition(min_distances, 2)[:2]

    # Row indices of the two closest points
    rowIndices = rowIndices[columnIndices]

    # Distances to two closest points
    closePoints_distances = min_distances[columnIndices]

    # Calculate the interpolant weights based on the distances
    interpolantWeights = closePoints_distances/np.sum(closePoints_distances)

    # Return the indices because I need to retrieve the data for the close points
    # Return the interpolant weights to interpolate the data once retrieved
    return rowIndices, columnIndices, interpolantWeights

vertices = np.random.rand(200, 3)
traces = np.random.rand(100, 3, 10)
nTraces = traces.shape[-1]

# This is a simplified version of what actually happens.
for index, currentLocation in enumerate(np.arange(vertices.shape[0])):
    interpolate(currentLocation, traces, nTraces)

由于数据的结构，我只能选择一块%timeit interpolater(currentLocation, streamlineBlock, nStreamlines) 10 loops, best of 3: 42.8 ms per loop来搜索（L~2e3），这显着减少了运行时间。要搜索的括号是currentLocation的函数。

traces

cProfile输出

cProfile告诉我，np.einsum和np.argmin是最慢的 - 实际上它们是计算的绝大部分。请注意，这是针对一小部分数据的代码，因此可能无法准确反映上述功能。

%timeit interpolaterNew(...)
100 loops, best of 3: 6.27 ms per loop

问题（S）

我现在对如何提高性能感到有点失落。鉴于距离计算和argmin排序是最昂贵的，是否可以“矢量化”这些步骤，将计算应用于整个“顶点”数组？我通过广播到轴= 4没有成功尝试这个 - 计算机冻结了。 cProfile报告是否指向其他任何内容，或者我的代码中是否有任何明显的错误？有人能指出我更好的方法吗？最后，使用tqdm，每秒迭代次数大幅度快速减少（前几分钟内减少到250次） - 这是预期的吗？

最小欧氏距离以及数组和数组列之间的相应索引

0 个答案: