Question

I've got a dataframe with this kind of structure:

import pandas as pd
from geopy.distance import vincenty

data = {'id': [1, 2, 3, 1, 2 , 3],
        'coord': [[10.1, 30.3], [10.5, 32.3], [11.1, 31.3],
                  [10.1, 30.3], [10.5, 32.3], [61, 29.1]],
       }
df = pd.DataFrame(data)

This is how it looks:

           coord    id
0   [10.1, 30.3]    1
1   [10.5, 32.3]    2
2   [11.1, 31.3]    3
3   [10.1, 30.3]    1
4   [10.5, 32.3]    2
5   [61, 29.1]      3

Now, I want to group by id. Then, I want to use the current and previous row of coords. These should be used in a function to compute the distance between the two coordinates:

This is what I've tried:

df.groupby('id')['coord'].apply(lambda x: vincenty(x, x.shift(1)))

vincenty(x,y) expects x like (10, 20) and the same for y and returns a float.

Obviously, this does not work. The function receives two Series objects instead of the two lists. So probably using x.values.tolist() should be the next step. However, my understanding of things ends here. Hence, I'd appreciate any ideas on how to tackle this!

Answer 1

I think you need shift column per group and then apply function with filter out NaNs rows:

def vincenty(x, y):
    print (x,y)
    return x + y

df['new'] = df.groupby('id')['coord'].shift()

m = df['new'].notnull()
df.loc[m, 'out'] = df.loc[m, :].apply(lambda x: vincenty(x['coord'], x['new']), axis=1)
print (df)
          coord  id           new                       out
0  [10.1, 30.3]   1           NaN                       NaN
1  [10.5, 32.3]   2           NaN                       NaN
2  [11.1, 31.3]   3           NaN                       NaN
3  [10.1, 30.3]   1  [10.1, 30.3]  [10.1, 30.3, 10.1, 30.3]
4  [10.5, 32.3]   2  [10.5, 32.3]  [10.5, 32.3, 10.5, 32.3]
5    [61, 29.1]   3  [11.1, 31.3]    [61, 29.1, 11.1, 31.3]

Grouped-By DataFrame: Use column-values in current and previous row in Function

1 个答案: