如何索引点列表以便更快地搜索附近的点?

时间:2016-12-24 12:44:24

标签: python search indexing point

对于(x,y)点的列表,我试图找到每个点的附近点。

from collections import defaultdict
from math import sqrt
from random import randint

# Generate a list of random (x, y) points
points = [(randint(0, 100), randint(0, 100)) for _ in range(1000)]

def is_nearby(point_a, point_b, max_distance=5):
    """Two points are nearby if their Euclidean distance is less than max_distance"""
    distance = sqrt((point_b[0] - point_a[0])**2 + (point_b[1] - point_a[1])**2)
    return distance < max_distance

# For each point, find nearby points that are within a radius of 5
nearby_points = defaultdict(list)
for point in points:
    for neighbour in points:
        if point != neighbour:
            if is_nearby(point, neighbour):
                nearby_points[point].append(neighbour)

有什么方法可以将points编入索引以使上述搜索速度更快?我觉得必须有比O更快的方式(len(points)** 2)。

编辑:一般来说,积分可能是浮点数,而不仅仅是整数

1 个答案:

答案 0 :(得分:1)

这是一个带有固定网格的版本,其中每个网格点都包含那里的样本数。

然后可以将搜索缩小到该点周围的空间。

from random import randint
import math

N = 100
N_SAMPLES = 1000

# create the grid
grd = [[0 for _ in range(N)] for __ in range(N)]

# set the number of points at a given gridpoint
for _ in range(N_SAMPLES):
    grd[randint(0, 99)][randint(0, 99)] += 1

def find_neighbours(grid, point, distance):

    # this will be: (x, y): number of points there
    points = {}

    for x in range(point[0]-distance, point[0]+distance):
        if x < 0 or x > N-1:
            continue
        for y in range(point[1]-distance, point[1]+distance):
            if y < 0 or y > N-1:
                continue
            dst = math.hypot(point[0]-x, point[1]-y)
            if dst > distance:
                continue
            if grd[x][y] > 0:
                points[(x, y)] = grd[x][y]
    return points

print(find_neighbours(grid=grd, point=(45, 36), distance=5))
# -> {(44, 37): 1, (45, 33): 1, ...}
# meadning: there is one neighbour at (44, 37) etc...

进一步优化:xy的测试可以针对给定的网格大小进行预先计算 - 对于每个点都不需要math.hypot(point[0]-x, point[1]-y)

并且用numpy数组替换网格可能是个好主意。

<强>更新

如果您的积分是float s,您仍然可以创建int网格来缩小搜索空间:

from random import uniform
from collections import defaultdict
import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    @property
    def x_int(self):
        return int(self.x)

    @property
    def y_int(self):
        return int(self.y)

    def __str__(self):
        fmt = '''{0.__class__.__name__}(x={0.x:5.2f}, y={0.y:5.2f})'''
        return fmt.format(self)

N = 100
MIN = 0
MAX = N-1

N_SAMPLES = 1000


# create the grid
grd = [[[] for _ in range(N)] for __ in range(N)]

# set the number of points at a given gridpoint
for _ in range(N_SAMPLES):
    p = Point(x=uniform(MIN, MAX), y=uniform(MIN, MAX))
    grd[p.x_int][p.y_int].append(p)


def find_neighbours(grid, point, distance):

    # this will be: (x_int, y_int): list of points
    points = defaultdict(list)

    # need to cast a slightly bigger net on the upper end of the range;
    # int() rounds down
    for x in range(point[0]-distance, point[0]+distance+1):
        if x < 0 or x > N-1:
            continue
        for y in range(point[1]-distance, point[1]+distance+1):
            if y < 0 or y > N-1:
                continue
            dst = math.hypot(point[0]-x, point[1]-y)
            if dst > distance + 1:  # account for rounding... is +1 enough?
                continue
            for pt in grd[x][y]:
                if math.hypot(pt.x-x, pt.y-y) <= distance:
                    points[(x, y)].append(pt)
    return points

res = find_neighbours(grid=grd, point=(45, 36), distance=5)

for int_point, points in res.items():
    print(int_point)
    for point in points:
        print('  ', point)

输出看起来像这样:

(44, 36)
   Point(x=44.03, y=36.93)
(41, 36)
   Point(x=41.91, y=36.55)
   Point(x=41.73, y=36.53)
   Point(x=41.56, y=36.88)
...

为方便Points现在是一个类。虽然可能没有必要......

取决于您的点的密集程度或稀疏程度,您还可以将网格表示为指向列表的字典或Points ...

find_neighbours函数也只接受在该版本中由point组成的起始int。这也可以改进。

并且还有很大的改进空间:可以使用三角法限制y轴的范围。而对于圈内的点数方式,则无需进行单独检查;只需要靠近圆圈的外缘进行详细检查。