我有一个all_points的数据框及其坐标:
all_points =
point_id latitude longitude
0 1 41.894577 -87.645307
1 2 41.894647 -87.640426
2 3 41.894713 -87.635513
3 4 41.894768 -87.630629
4 5 41.894830 -87.625793
和parent_points的数据框:
parent_pts =
parent_id
0 1
1 2
我想在all_points数据框上创建一个列,其中每个点的父点最近。
这是我的试用版,但我可能会让它变得更复杂:
from scipy.spatial.distance import cdist
def closest_point(point, points):
""" Find closest point from a list of points. """
return points[cdist([point], points).argmin()]
def match_value(df, col1, x, col2):
""" Match value x from col1 row to value in col2. """
return df[df[col1] == x][col2].values[0]
all_points['point'] = [(x, y) for x,y in zip(all_points['latitude'], all_points['longitude'])]
parent_pts['point'] = all_points['point'][all_points['point_id '].isin(parent_pts['parent_id'])]
all_points['parent'] = [match_value(parent_pts, 'point', x, 'parent_id') for x in all_points['closest']]
parent_point是all_points的子集。
当我尝试使用nearest_point函数时出现此错误:
ValueError: XB must be a 2-dimensional array.
答案 0 :(得分:1)
如果您想使用欧几里德距离并使用索引作为您的点ID,则可以执行此操作
def findClose(inX,inY,cIndex,X,Y):
X,Y = X - inX,Y-inY
X,Y = X**2,Y**2
dist = np.sqrt(np.sum([X, Y], axis=0))
dist[cIndex] = np.max(dist)*100 # ensure not the current index
return np.argmin(dist)
X,Y = all_points['latitude'].as_matrix(),all_points['longitude'].as_matrix()
all_points['point_id'] = all_points.index
all_points['Parents'] = all_points.apply(lambda row:
findClose(row['latitude'],row['longitude'],
row['point_id'],X,Y),axis=1)
产生
print all_points
point_id latitude longitude Parents
0 0 41.894577 -87.645307 1
1 1 41.894647 -87.640426 0
2 2 41.894713 -87.635513 3
3 3 41.894768 -87.630629 4
4 4 41.894830 -87.625793 3
答案 1 :(得分:1)
首先,我首先要说的是,在我看来,你的经度和纬度都是地球上的位置。假设地球是一个球体,两点之间的距离应该计算为great-circle distance的长度,而不是cdist
所得的欧几里德距离。
从编程的角度来看最简单的方法(除了你的学习曲线)是使用astropy
package。他们有时会提供有用的示例,但请参阅match_coordinates_sky()
或catalog matching with astropy。
然后你可能会这样做:
>>> from astropy.units import Quantity
>>> from astropy.coordinates import match_coordinates_sky, SkyCoord, EarthLocation
>>> from pandas import DataFrame
>>> import numpy as np
>>>
>>> # Create your data as I understood it:
>>> all_points = DataFrame({'point_id': np.arange(1,6), 'latitude': [41.894577, 41.894647, 41.894713, 41.894768, 41.894830], 'longitude': [-87.645307, -87.640426, -87.635513, -87.630629, -87.625793 ]})
>>> parent_pts = DataFrame({'parent_id': [1, 2]})
>>>
>>> # Create a frame with the coordinates of the "parent" points:
>>> parent_coord = all_points.loc[all_points['point_id'].isin(parent_pts['parent_id'])]
>>> print(parent_coord)
latitude longitude point_id
0 41.894577 -87.645307 1
1 41.894647 -87.640426 2
>>>
>>> # Create coordinate array for "points" (in principle the below statements
>>> # could be combined into a single one):
>>> all_lon = Quantity(all_points['longitude'], unit='deg')
>>> all_lat = Quantity(all_points['latitude'], unit='deg')
>>> all_pts = SkyCoord(EarthLocation.from_geodetic(all_lon, all_lat).itrs, frame='itrs')
>>>
>>> # Create coordinate array for "parent points":
>>> parent_lon = Quantity(parent_coord['longitude'], unit='deg')
>>> parent_lat = Quantity(parent_coord['latitude'], unit='deg')
>>> parent_catalog = SkyCoord(EarthLocation.from_geodetic(parent_lon, parent_lat).itrs, frame='itrs')
>>>
>>> # Get the indices (in parent_catalog) of parent coordinates
>>> # closest to each point:
>>> matched_indices = match_coordinates_sky(all_pts, parent_catalog)[0]
Downloading http://maia.usno.navy.mil/ser7/finals2000A.all
|========================================================================| 3.1M/3.1M (100.00%) 0s
>>> all_points['parent_id'] = [parent_pts['parent_id'][idx] for idx in matched_indices]
>>> print(all_points)
latitude longitude point_id parent_id
0 41.894577 -87.645307 1 1
1 41.894647 -87.640426 2 2
2 41.894713 -87.635513 3 2
3 41.894768 -87.630629 4 2
4 41.894830 -87.625793 5 2
我想补充说match_coordinates_sky()
不仅返回匹配的索引,还返回数据点和匹配的“父”点之间的角度分隔列表,以及数据点和匹配的点之间的距离“父母”点。它可能对您的问题有用。