我想计算自变量event
等于1以来的覆盖距离。重要的是应该为每个ID计算距离。
我的数据集由date
,car_id
,latitude
,longitude
以及指示事件与否的伪变量组成。我用来计算距离的公式是:
def Distance(Latitude, Longitude, LatitudeDecimal, LongitudeDecimal):
az12,az21,dist = wgs84_geod.inv(Longitude, Latitude, LongitudeDecimal, LatitudeDecimal)
return dist
我想要的是计算每个event==1
到最后一个car_id
以来两个地理位置之间的距离,因此计算列distance_since_event
:
date car_id latitude longitude event distance_since_event
01/01/2019 1 43.5863 7.12993 0 -1
01/01/2019 2 44.3929 8.93832 0 -1
02/01/2019 1 43.5393 7.03134 1 -1
02/01/2019 2 39.459462 -0.312280 0 -1
03/01/2019 1 44.3173 84.942 0 calculation=(distance from 02/01/2019-03/01/2019 for ID=1)
03/01/2019 2 -12.3284 -9.04522 1 -1
04/01/2019 1 -36.8414 17.4762 0 calculation=(distance from 02/01/2019-04/01/2019 for ID=1)
04/01/2019 2 43.542 10.2958 0 calculation=(distance from 03/01/2019-04/01/2019 for ID=2)
05/01/2019 1 43.5242 69.473 0 calculation=(distance from 02/01/2019-05/01/2019 for ID=1)
05/01/2019 2 37.9382 23.668 1 calculation=(distance from 03/01/2019-05/01/2019 for ID=2)
06/01/2019 1 4.4409 89.218 1 calculation=(distance from 02/01/2019-06/01/2019 for ID=1)
06/02/2019 2 25.078037 -77.328900 0 calculation=(distance from 05/01/2019-06/01/2019 for ID=2)
答案 0 :(得分:0)
在这里为您提供帮助的关键功能是pandas.merge_asof
和allow_exact_matches=False
import pandas as pd
input = pd.DataFrame([\
["01/01/2019", 1, 43.5863 , 7.12993, 0],
["01/01/2019", 2, 44.3929 , 8.93832, 0],
["02/01/2019", 1, 43.5393 , 7.03134, 1],
["02/01/2019", 2, 39.459462, -0.31228, 0],
["03/01/2019", 1, 44.3173 , 84.942, 0],
["03/01/2019", 2, -12.3284 ,-9.04522, 1],
["04/01/2019", 1, -36.8414 ,17.4762, 0],
["04/01/2019", 2, 43.542 , 10.2958, 0],
["05/01/2019", 1, 43.5242 , 69.473, 0],
["05/01/2019", 2, 37.9382 , 23.668, 1],
["06/01/2019", 1, 4.4409 , 89.218, 1],
["06/02/2019", 2, 25.078037, -77.3289, 0]],
columns=["date","car_id","latitude", "longitude" , "event"])
input['date'] = pd.to_datetime(input['date'])
df = pd.merge_asof(input.set_index('date'), input.loc[input['event'] == 1].set_index('date'),
on='date', suffixes=['_l','_r'], by='car_id', allow_exact_matches=False)
这时,df中的每一行已经包含了进一步计算所需的必要元素。由于我不确定您的Distance()
函数是否接受数据帧,因此我们可以使用.apply()
附加distance_since_event
列。
def getDistance(lat1, lat2, long1, long2):
if pd.isna(lat2) or pd.isna(long2):
return -1
# substitute this with the actual wgs84_geod library that you eventually use
return ((lat2-lat1)**2 + (long2-long1)**2) **0.5
df['distance_since_event'] = df.apply(lambda row: getDistance(row['latitude_l'], row['latitude_r'], row['longitude_l'], row['longitude_r']), axis=1)
print(df)
输出:
car_id date latitude_l longitude_l event_l latitude_r longitude_r event_r distance_since_event
0 1 2019-01-01 43.586300 7.12993 0 NaN NaN NaN -1.000000
1 2 2019-01-01 44.392900 8.93832 0 NaN NaN NaN -1.000000
2 1 2019-02-01 43.539300 7.03134 1 NaN NaN NaN -1.000000
3 2 2019-02-01 39.459462 -0.31228 0 NaN NaN NaN -1.000000
4 1 2019-03-01 44.317300 84.94200 0 43.5393 7.03134 1.0 77.914544
5 2 2019-03-01 -12.328400 -9.04522 1 NaN NaN NaN -1.000000
6 1 2019-04-01 -36.841400 17.47620 0 43.5393 7.03134 1.0 81.056474
7 2 2019-04-01 43.542000 10.29580 0 -12.3284 -9.04522 1.0 59.123402
8 1 2019-05-01 43.524200 69.47300 0 43.5393 7.03134 1.0 62.441662
9 2 2019-05-01 37.938200 23.66800 1 -12.3284 -9.04522 1.0 59.974043
10 1 2019-06-01 4.440900 89.21800 1 43.5393 7.03134 1.0 91.012812
11 2 2019-06-02 25.078037 -77.32890 0 37.9382 23.66800 1.0 101.812365
在这里,您可以根据需要重命名或删除列