熊猫-比较2个数据框并找到最接近的值

时间:2020-05-29 09:50:50

标签: python pandas gis pyproj

我在Pandas中有2个数据框,分别包含longitudelatitude。我试图遍历第一行中的每一行,并在第二个数据帧中找到最接近的匹配longitudelatitude

到目前为止,我在python中拥有此功能,而我在另一篇SO帖子中发现了此信息...

from math import cos, asin, sqrt

def distance(lat1, lon1, lat2, lon2):
    p = 0.017453292519943295
    a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
    return 12742 * asin(sqrt(a))

def closest(data, v):
    return min(data, key=lambda p: distance(v['lat'],v['lon'],p['lat'],p['lon']))

tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681}, 
                {'lat': 39.762241,  'lon': -86.158436 }, 
                {'lat': 39.7622292, 'lon': -86.1578917}]

v = {'lat': 39.7622290, 'lon': -86.1519750}
print(closest(tempDataList, v))

我将尝试对其进行修改,以用于我的熊猫数据框,但是例如,有没有更有效的方法使用PyProj来做到这一点?

有人有示例或类似代码吗?

1 个答案:

答案 0 :(得分:0)

我认为,如果您使用GIS库,则可以更轻松地完成此操作。因此,如果您正在使用Geopandas和Shapely,它将更加舒适。 (也使用pyproj。)从下面的代码开始。

import pandas as pd
import geopandas as gpd
from shapely.ops import Point, nearest_points

tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681}, 
                {'lat': 39.762241,  'lon': -86.158436 }, 
                {'lat': 39.7622292, 'lon': -86.1578917}]

df = pd.DataFrame(tempDataList)

#make point geometry for geopandas
geometry = [Point(xy) for xy in zip(df['lon'], df['lat'])]

#use a coordinate system that matches your coordinates. EPSG 4326 is WGS84
gdf = gpd.GeoDataFrame(df, crs = "EPSG:4326", geometry = geometry) 

#change point geometry
v = {'lat': 39.7622290, 'lon': -86.1519750}
tp = Point(v['lon'], v['lat'])

#now you can calculate the distance between v and others.
gdf.distance(tp)

#If you want to get nearest points
multipoints = gdf['geometry'].unary_union
queried_point, nearest_point = nearest_points(tp, multipoints)
print(nearest_point)