我有以下两个数据框(缩短了):
df1
day Transmitter_ID Species Lat Lng Date
4 A69-1601-27466 Golden perch -35.495479100000004 144.45295380000002 13/08/2015
5 A69-1601-27466 Golden perch -35.495479100000004 144.45295380000002 14/08/2015
6 A69-1601-27466 Golden perch -35.495479100000004 144.45295380000002 15/08/2015
7 A69-1601-27466 Golden perch -35.495479100000004 144.45295380000002 16/08/2015
8 A69-1601-27466 Golden perch -35.5065473 144.4488804 17/08/2015
8 A69-1601-27466 Golden perch -35.495479100000004 144.45295380000002 17/08/2015
9 A69-1601-27466 Golden perch -35.5065473 144.4488804 18/08/2015
10 A69-1601-27466 Golden perch -35.5065473 144.4488804 19/08/2015
11 A69-1601-27466 Golden perch -35.5065473 144.4488804 20/08/2015
12 A69-1601-27466 Golden perch -35.5065473 144.4488804 21/08/2015
13 A69-1601-27466 Golden perch -35.5065473 144.4488804 22/08/2015
14 A69-1601-27466 Golden perch -35.5065473 144.4488804 23/08/2015
15 A69-1601-27466 Golden perch -35.5065473 144.4488804 24/08/2015
rivergps_df
Lng Lat River
151.7753278 -32.90526725 HUNTER RIVER
151.77526830000002 -32.90610052 HUNTER RIVER
151.77526830000002 -32.90752299 HUNTER RIVER
151.77526830000002 -32.90758849 HUNTER RIVER
151.775397 -32.90977754 HUNTER RIVER
151.7754468 -32.91062396 HUNTER RIVER
151.775578 -32.91202941 HUNTER RIVER
151.77578799999998 -32.9142797 HUNTER RIVER
151.7758178 -32.91459931 HUNTER RIVER
151.77586340000002 -32.91508789 HUNTER RIVER
151.7764116 -32.91645856 HUNTER RIVER
151.7765776 -32.91687345 HUNTER RIVER
151.77719040000002 -32.91861786 HUNTER RIVER
我还有一个haversine函数,它需要一对lat,lng并返回两对之间的距离
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
我想对两个数据框进行处理:
从df1中获取每个lng / lat,并针对每个点,从rivergps_df的所有lng / lat范围内应用Haversine函数
返回在haversine函数中出现最小值的rivergps_df的索引
将此rivergps_df索引附加到df1
所以我的意思是对于df1中的第一个点-35.495479100000004,144.45295380000002,我想对此应用Haversine函数,如lon1,lat1与lon2,lat2,其中lon2,lat2是rivergps_df中存在的所有点。然后,我想查找haversine函数返回的最小值,将其附加到df1并移至df1中的下一个点。
我该怎么做?
答案 0 :(得分:0)
一个主意:
定义一个在haversin_argmin(lat, lon, df)
上迭代的函数df
(例如for (lat2, lon2) df[['Lat', 'Lon']].iterrows():
),并为argmin
计算并返回haversine(lat, lon, lat2, lon2)
。
然后定义另一个函数f
,该函数接受一个row
,获取lat
和lon
,并用haversin_argmin
调用rivergps_df
,并返回row
,并附加argmin
作为新字段。
使用pandas.DataFrame.apply到apply
f
到df1
。
阅读apply
的文档,以更好地理解如何定义f
以及传递给apply
的选项。