我有一个看起来像这样的df,我在其中按ID分组
id lat lon
1 NaN NaN
1 40.121 23.749
1 -56.154 -39.572
1 21.908 17.537
1 31.221 -36.186
1 -56.655 0.016
2 NaN NaN
2 -36.438 14.874
2 -21.422 81.271
2 43.961 -95.551
3 NaN NaN
3 79.821 -56.781
使用Haversine函数,我想计算当前行到上一行的距离。因此,将使用
计算新列的第一个条目lat 1 = 40.121
lon 1 = 23.749
lat 2 = -56.154
lon 2 = -39.572
答案 0 :(得分:0)
改编自this答案。链接的答案显示了如何计算每行与经度/纬度的某个固定值之间的距离-我的修改使其适用于您的情况。
首先,使用shift
在同一行上获取所需的所有值:
df['lon2'] = df['lon'].shift(-1)
df['lat2'] = df['lat'].shift(-1)
给予:
id lat lon lat2 lon2
0 1 NaN NaN 40.121 23.749
1 1 40.121 23.749 -56.154 -39.572
2 1 -56.154 -39.572 21.908 17.537
3 1 21.908 17.537 31.221 -36.186
4 1 31.221 -36.186 -56.655 0.016
5 1 -56.655 0.016 NaN NaN
6 2 NaN NaN -36.438 14.874
7 2 -36.438 14.874 -21.422 81.271
8 2 -21.422 81.271 43.961 -95.551
9 2 43.961 -95.551 NaN NaN
10 3 NaN NaN 79.821 -56.781
11 3 79.821 -56.781 NaN NaN
然后定义距离计算功能:
from numpy import cos, sin, arcsin, sqrt
from math import radians
def haversine(row):
lon1 = row['lon']
lat1 = row['lat']
lon2 = row['lon2']
lat2 = row['lat2']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
并使用apply
将其应用于您的数据:
df['distance'] = df.apply(haversine, axis=1)
给予:
id lat lon lat2 lon2 distance
0 1 NaN NaN 40.121 23.749 NaN
1 1 40.121 23.749 -56.154 -39.572 12237.017692
2 1 -56.154 -39.572 21.908 17.537 10187.684397
3 1 21.908 17.537 31.221 -36.186 5387.540299
4 1 31.221 -36.186 -56.655 0.016 10343.267833
5 1 -56.655 0.016 NaN NaN NaN
6 2 NaN NaN -36.438 14.874 NaN
7 2 -36.438 14.874 -21.422 81.271 6543.302199
8 2 -21.422 81.271 43.961 -95.551 17480.809345
9 2 43.961 -95.551 NaN NaN NaN
10 3 NaN NaN 79.821 -56.781 NaN
11 3 79.821 -56.781 NaN NaN NaN
我相信可以显示出您正在寻找的结果(我测试了第一个,似乎是正确的)。
如果愿意,一旦计算完成,您就可以摆脱两个副纬度/经度列:
df.drop(['lat2', 'lon2'], axis=1, inplace=True)
我应该注意,该解决方案不会为您提供最快的计算速度。请查看我链接的答案的下半部分,以探讨如何在此处将性能放在首位的情况下可以改进它,尽管需要对其进行调整。