熊猫:在最大距离内找到点

时间:2014-11-09 06:36:39

标签: python pandas distance

我试图找到彼此最大距离内的(x,y)点对。我认为最简单的方法是生成一个DataFrame并逐个遍历每个点,计算在给定点(x_0,y_0)的距离r内是否存在坐标(x,y)的点。然后,将发现的总数除以2。

%pylab inline
import pandas as pd

def find_nbrs(low, high, num, max_d):
    x = random.uniform(low, high, num)
    y = random.uniform(low, high, num)
    points = pd.DataFrame({'x':x, 'y':y})

    tot_nbrs = 0

    for i in arange(len(points)):
        x_0 = points.x[i]
        y_0 = points.y[i]

        pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
        tot_nbrs += len(pt_nbrz)
        plot (pt_nbrz.x, pt_nbrz.y, 'r-')

    plot (points.x, points.y, 'b.')
    return tot_nbrs

print find_nbrs(0, 1, 50, 0.1)
  1. 首先,它并不总能找到合适的对(我看到在指定距离内未被标记的点)。

  2. 如果我写plot(..., 'or'),它会突出显示所有要点。这意味着pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]至少返回一个(x,y)。为什么?如果比较为False,它不应该返回一个空数组吗?

  3. 如何在熊猫中更优雅地完成上述所有操作?例如,无需遍历每个元素。

1 个答案:

答案 0 :(得分:7)

您正在寻找的功能包含在scipy's spatial distance module

以下是您如何使用它的示例。真正的魔力在于squareform(pdist(points))

from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt

points = np.random.uniform(-.5, .5, (1000,2))

# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))

poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min

print("There are {} other points within a distance of {} from the point "
    "({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))

There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)

出于可视化目的:

f,ax = plt.subplots(subplot_kw=
    dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
    marker='o', mfc='none', mec='r', ls='')  # draw all points within distance

t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()

enter image description here