I've got a dataframe with this kind of structure:
import pandas as pd
from geopy.distance import vincenty
data = {'id': [1, 2, 3, 1, 2 , 3],
'coord': [[10.1, 30.3], [10.5, 32.3], [11.1, 31.3],
[10.1, 30.3], [10.5, 32.3], [61, 29.1]],
}
df = pd.DataFrame(data)
This is how it looks:
coord id
0 [10.1, 30.3] 1
1 [10.5, 32.3] 2
2 [11.1, 31.3] 3
3 [10.1, 30.3] 1
4 [10.5, 32.3] 2
5 [61, 29.1] 3
Now, I want to group by id
. Then, I want to use the current and previous row of coords
. These should be used in a function to compute the distance between the two coordinates:
This is what I've tried:
df.groupby('id')['coord'].apply(lambda x: vincenty(x, x.shift(1)))
vincenty(x,y)
expects x
like (10, 20) and the same for y
and returns a float.
Obviously, this does not work. The function receives two Series objects instead of the two lists. So probably using x.values.tolist()
should be the next step. However, my understanding of things ends here. Hence, I'd appreciate any ideas on how to tackle this!
答案 0 :(得分:2)
I think you need shift
column per group and then apply function with filter out NaN
s rows:
def vincenty(x, y):
print (x,y)
return x + y
df['new'] = df.groupby('id')['coord'].shift()
m = df['new'].notnull()
df.loc[m, 'out'] = df.loc[m, :].apply(lambda x: vincenty(x['coord'], x['new']), axis=1)
print (df)
coord id new out
0 [10.1, 30.3] 1 NaN NaN
1 [10.5, 32.3] 2 NaN NaN
2 [11.1, 31.3] 3 NaN NaN
3 [10.1, 30.3] 1 [10.1, 30.3] [10.1, 30.3, 10.1, 30.3]
4 [10.5, 32.3] 2 [10.5, 32.3] [10.5, 32.3, 10.5, 32.3]
5 [61, 29.1] 3 [11.1, 31.3] [61, 29.1, 11.1, 31.3]