Question

由于速度问题，我超越了my之前的问题。我有一个Lat / Lon坐标点数组，我想将它们分配给从相同大小的单元格的2D方格网格派生的索引代码。这是一个如何的例子。让我们调用points我的第一个数组包含六个点的坐标（称为[x y]对）：

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

然后我有另一个数组，其中包含[minx，miny，maxx，maxy]形式的两个单元格的顶点坐标;我们称之为bounds：

bounds = [[ 0.  0.  2.  2.]
 [ 2.  2.  3.  3.]]

我想找到哪个点在哪个边界，然后分配从bounds数组索引派生的代码（在这种情况下，第一个单元格有代码0，第二个单元格等等...... ）。由于单元格是正方形，因此计算每个单元格中每个单元格的最简单方法是评估：

x > minx & x < maxx & y > miny & y < maxy

这样生成的数组将显示为：

results = [0 0 1 0 NaN NaN]

其中NaN表示该点在细胞外。在我的实际情况中，元素的数量是在10 ^ 4个单元格中找到10 ^ 6个点的顺序。有没有办法使用numpy数组快速完成这类事情？

编辑：澄清一下，预期的results数组意味着第一个点在第一个单元格内（bounds数组的0个索引），所以第二个，第一个在第二个单元格内bounds数组等等...

Answer 1

您可以使用嵌套循环来检查条件并将结果作为生成器生成：

points = [[ 1.5  1.5]
 [ 1.1  1.1]
 [ 2.2  2.2]
 [ 1.3  1.3]
 [ 3.4  1.4]
 [ 2.   1.5]]

bounds = [[ 0.  ,0. , 2.,  2.],
 [ 2.  ,2.  ,3.,  3.]]

import numpy as np

def pos(p,b):
  for x,y in p:
    flag=False
    for index,dis in enumerate(b):
      minx,miny,maxx,maxy=dis
      if x > minx and x < maxx and y > miny and y < maxy :
        flag=True
        yield index
    if not flag:
        yield 'NaN'


print list(pos(points,bounds))

结果：

[0, 0, 1, 0, 'NaN', 'NaN']

Answer 2

这是针对您的问题的矢量化方法。它应该会显着加快速度。

import numpy as np
def findCells(points, bounds):
    # make sure points is n by 2 (pool.map might send us 1D arrays)
    points = points.reshape((-1,2))

    # check for each point if all coordinates are in bounds
    # dimension 0 is bound
    # dimension 1 is is point
    allInBounds = (points[:,0] > bounds[:,None,0])
    allInBounds &= (points[:,1] > bounds[:,None,1])
    allInBounds &= (points[:,0] < bounds[:,None,2])
    allInBounds &= (points[:,1] < bounds[:,None,3])


    # now find out the positions of all nonzero (i.e. true) values
    # nz[0] contains the indices along dim 0 (bound)
    # nz[1] contains the indices along dim 1 (point)
    nz = np.nonzero(allInBounds)

    # initialize the result with all nan
    r = np.full(points.shape[0], np.nan)
    # now use nz[1] to index point position and nz[0] to tell which cell the
    # point belongs to
    r[nz[1]] = nz[0]
    return r

def findCellsParallel(points, bounds, chunksize=100):
    import multiprocessing as mp
    from functools import partial

    func = partial(findCells, bounds=bounds)

    # using python3 you could also do 'with mp.Pool() as p:'  
    p = mp.Pool()
    try:
        return np.hstack(p.map(func, points, chunksize))
    finally:
        p.close()

def main():
    nPoints = 1e6
    nBounds = 1e4

    # points = np.array([[ 1.5, 1.5],
    #                    [ 1.1, 1.1],
    #                    [ 2.2, 2.2],
    #                    [ 1.3, 1.3],
    #                    [ 3.4, 1.4],
    #                    [ 2. , 1.5]])

    points = np.random.random([nPoints, 2])

    # bounds = np.array([[0,0,2,2],
    #                    [2,2,3,3]])

    # bounds = np.array([[0,0,1.4,1.4],
    #                    [1.4,1.4,2,2],
    #                    [2,2,3,3]])

    bounds = np.sort(np.random.random([nBounds, 2, 2]), 1).reshape(nBounds, 4)

    r = findCellsParallel(points, bounds)
    print(points[:10])
    for bIdx in np.unique(r[:10]):
        if np.isnan(bIdx):
            continue
        print("{}: {}".format(bIdx, bounds[bIdx]))
    print(r[:10])

if __name__ == "__main__":
    main()

修改
尝试使用您的数据量给我一个MemoryError。如果您将multiprocessing.Pool与map函数一起使用，则可以避免这种情况，甚至可以加快速度，请参阅更新后的代码。

结果：

>time python test.py [[ 0.69083585 0.19840985] [ 0.31732711 0.80462512] [ 0.30542996 0.08569184] [ 0.72582609 0.46687164] [ 0.50534322 0.35530554] [ 0.93581095 0.36375539] [ 0.66226118 0.62573407] [ 0.08941219 0.05944215] [ 0.43015872 0.95306899] [ 0.43171644 0.74393729]] 9935.0: [ 0.31584562 0.18404152 0.98215445 0.83625487] 9963.0: [ 0.00526106 0.017255 0.33177741 0.9894455 ] 9989.0: [ 0.17328876 0.08181912 0.33170444 0.23493507] 9992.0: [ 0.34548987 0.15906761 0.92277442 0.9972481 ] 9993.0: [ 0.12448765 0.5404578 0.33981119 0.906822 ] 9996.0: [ 0.41198261 0.50958195 0.62843379 0.82677092] 9999.0: [ 0.437169 0.17833114 0.91096133 0.70713434] [ 9999. 9993. 9989. 9999. 9999. 9935. 9999. 9963. 9992. 9996.] real 0m 24.352s user 3m 4.919s sys 0m 1.464s

Answer 3

我会这样做：

import numpy as np

points = np.random.rand(10,2)

xmin = [0.25,0.5]
ymin = [0.25,0.5]

results = np.zeros(len(points))

for i in range(len(xmin)):
     bool_index_array = np.greater(points, [xmin[i],ymin[i]])
     print "boolean index of (x,y) greater (xmin, ymin): ", bool_index_array
     indicies_of_true_true = np.where(bool_index_array[:,0]*bool_index_array[:,1]==1)[0]
     print "indices of [True,True]: ", indicies_of_true_true
     results[indicies_of_true_true] += 1

print "results: ", results

[out]: [ 1.  1.  1.  2.  0.  0.  1.  1.  1.  1.]

这使用较低的边界将您的点分类为组：

1（如果xmin [0]＆lt; x＆lt; = xmin [1]＆amp; ymin [0]＆lt; y＆lt; = ymin [1]）
2（如果x> xmin [1]＆amp; y＆gt; ymin [1]）
0如果上述条件均未满足

将numpy点数组分配给2D方格

3 个答案: