我在Pandas中有2个数据框,分别包含longitude
和latitude
。我试图遍历第一行中的每一行,并在第二个数据帧中找到最接近的匹配longitude
和latitude
。
到目前为止,我在python
中拥有此功能,而我在另一篇SO帖子中发现了此信息...
from math import cos, asin, sqrt
def distance(lat1, lon1, lat2, lon2):
p = 0.017453292519943295
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
return 12742 * asin(sqrt(a))
def closest(data, v):
return min(data, key=lambda p: distance(v['lat'],v['lon'],p['lat'],p['lon']))
tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681},
{'lat': 39.762241, 'lon': -86.158436 },
{'lat': 39.7622292, 'lon': -86.1578917}]
v = {'lat': 39.7622290, 'lon': -86.1519750}
print(closest(tempDataList, v))
我将尝试对其进行修改,以用于我的熊猫数据框,但是例如,有没有更有效的方法使用PyProj
来做到这一点?
有人有示例或类似代码吗?
答案 0 :(得分:0)
我认为,如果您使用GIS库,则可以更轻松地完成此操作。因此,如果您正在使用Geopandas和Shapely,它将更加舒适。 (也使用pyproj。)从下面的代码开始。
import pandas as pd
import geopandas as gpd
from shapely.ops import Point, nearest_points
tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681},
{'lat': 39.762241, 'lon': -86.158436 },
{'lat': 39.7622292, 'lon': -86.1578917}]
df = pd.DataFrame(tempDataList)
#make point geometry for geopandas
geometry = [Point(xy) for xy in zip(df['lon'], df['lat'])]
#use a coordinate system that matches your coordinates. EPSG 4326 is WGS84
gdf = gpd.GeoDataFrame(df, crs = "EPSG:4326", geometry = geometry)
#change point geometry
v = {'lat': 39.7622290, 'lon': -86.1519750}
tp = Point(v['lon'], v['lat'])
#now you can calculate the distance between v and others.
gdf.distance(tp)
#If you want to get nearest points
multipoints = gdf['geometry'].unary_union
queried_point, nearest_point = nearest_points(tp, multipoints)
print(nearest_point)