寻找最近点

时间:2017-07-17 00:36:18

标签: python-3.x pandas

我有一个all_points的数据框及其坐标:

all_points =
   point_id   latitude  longitude  
0          1  41.894577 -87.645307  
1          2  41.894647 -87.640426 
2          3  41.894713 -87.635513 
3          4  41.894768 -87.630629  
4          5  41.894830 -87.625793 

和parent_points的数据框:

parent_pts = 
       parent_id
0       1             
1       2     

我想在all_points数据框上创建一个列,其中每个点的父点最近。

这是我的试用版,但我可能会让它变得更复杂:

from scipy.spatial.distance import cdist

def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]

all_points['point'] = [(x, y) for x,y in zip(all_points['latitude'], all_points['longitude'])]
parent_pts['point'] = all_points['point'][all_points['point_id   '].isin(parent_pts['parent_id'])]

all_points['parent'] = [match_value(parent_pts, 'point', x, 'parent_id') for x in all_points['closest']]

parent_point是all_points的子集。

当我尝试使用nearest_point函数时出现此错误:

ValueError: XB must be a 2-dimensional array.

2 个答案:

答案 0 :(得分:1)

如果您想使用欧几里德距离并使用索引作为您的点ID,则可以执行此操作

def findClose(inX,inY,cIndex,X,Y):
    X,Y = X - inX,Y-inY
    X,Y = X**2,Y**2
    dist = np.sqrt(np.sum([X, Y], axis=0))
    dist[cIndex] = np.max(dist)*100 # ensure not the current index
    return np.argmin(dist)

X,Y = all_points['latitude'].as_matrix(),all_points['longitude'].as_matrix()
all_points['point_id'] = all_points.index
all_points['Parents'] = all_points.apply(lambda row: 
                    findClose(row['latitude'],row['longitude'],
                    row['point_id'],X,Y),axis=1)

产生

print all_points

   point_id   latitude  longitude  Parents
0         0  41.894577 -87.645307        1
1         1  41.894647 -87.640426        0
2         2  41.894713 -87.635513        3
3         3  41.894768 -87.630629        4
4         4  41.894830 -87.625793        3

答案 1 :(得分:1)

首先,我首先要说的是,在我看来,你的经度和纬度都是地球上的位置。假设地球是一个球体,两点之间的距离应该计算为great-circle distance的长度,而不是cdist所得的欧几里德距离。

从编程的角度来看最简单的方法(除了你的学习曲线)是使用astropy package。他们有时会提供有用的示例,但请参阅match_coordinates_sky()catalog matching with astropy

然后你可能会这样做:

>>> from astropy.units import Quantity
>>> from astropy.coordinates import match_coordinates_sky, SkyCoord, EarthLocation
>>> from pandas import DataFrame
>>> import numpy as np
>>>
>>> # Create your data as I understood it:
>>> all_points = DataFrame({'point_id': np.arange(1,6), 'latitude': [41.894577, 41.894647, 41.894713, 41.894768, 41.894830], 'longitude': [-87.645307, -87.640426, -87.635513, -87.630629, -87.625793 ]})
>>> parent_pts = DataFrame({'parent_id': [1, 2]})
>>>
>>> # Create a frame with the coordinates of the "parent" points:
>>> parent_coord = all_points.loc[all_points['point_id'].isin(parent_pts['parent_id'])]
>>> print(parent_coord)
    latitude  longitude  point_id
0  41.894577 -87.645307         1
1  41.894647 -87.640426         2
>>>
>>> # Create coordinate array for "points" (in principle the below statements
>>> # could be combined into a single one):
>>> all_lon = Quantity(all_points['longitude'], unit='deg')
>>> all_lat = Quantity(all_points['latitude'], unit='deg')
>>> all_pts = SkyCoord(EarthLocation.from_geodetic(all_lon, all_lat).itrs, frame='itrs')
>>>
>>> # Create coordinate array for "parent points":
>>> parent_lon = Quantity(parent_coord['longitude'], unit='deg')
>>> parent_lat = Quantity(parent_coord['latitude'], unit='deg')
>>> parent_catalog = SkyCoord(EarthLocation.from_geodetic(parent_lon, parent_lat).itrs, frame='itrs')
>>> 
>>> # Get the indices (in parent_catalog) of parent coordinates
>>> # closest to each point:
>>> matched_indices = match_coordinates_sky(all_pts, parent_catalog)[0]
Downloading http://maia.usno.navy.mil/ser7/finals2000A.all
|========================================================================| 3.1M/3.1M (100.00%)         0s
>>> all_points['parent_id'] = [parent_pts['parent_id'][idx] for idx in matched_indices]
>>> print(all_points)
    latitude  longitude  point_id  parent_id
0  41.894577 -87.645307         1          1
1  41.894647 -87.640426         2          2
2  41.894713 -87.635513         3          2
3  41.894768 -87.630629         4          2
4  41.894830 -87.625793         5          2

我想补充说match_coordinates_sky()不仅返回匹配的索引,还返回数据点和匹配的“父”点之间的角度分隔列表,以及数据点和匹配的点之间的距离“父母”点。它可能对您的问题有用。