下一个/在Dataframe分组中的上一个opreation

时间:2016-11-24 19:18:55

标签: python-2.7 dataframe group-by

我希望在按某些列对其进行分组后,从给定的数据框中获取下一个(第二个)条目。如果这些中的任何一个不存在,那么它应该根据时间返回nan / nat。请考虑以下示例:

>>> df1 = pd.DataFrame({'School': {0: 'DEF', 1: 'ABC', 2: 'PQR', 3: 'DEF', 4: 'PQR', 5: 'PQR'}, 'OpenTime': {0: '08:00:00.000', 1: '09:00:00.000', 2: '10:00:23.563', 3: '09:30:05.908', 4: '07:15:50.100', 5: '08:15:00.000'}, 'CloseTime': {0: '13:00:00.000', 1: '14:00:00.000', 2: '13:30:00.100', 3: '15:00:00.768', 4: '13:00:00.500', 5: '15:50:32.534'}, 'IsTopper':{0:'1',1:'1',2:'1',3:'1',4:'1',5:'-1'}})
>>> df1
      CloseTime IsTopper      OpenTime School
0  13:00:00.000        1  08:00:00.000    DEF
1  14:00:00.000        1  09:00:00.000    ABC
2  13:30:00.100        1  10:00:23.563    PQR
3  15:00:00.768        1  09:30:05.908    DEF
4  13:00:00.500        1  07:15:50.100    PQR
5  15:50:32.534       -1  08:15:00.000    PQR

获得第一个值很简单,可以通过以下任一方式实现

>>> df1.groupby(['School', 'IsTopper'])['OpenTime'].first()

OR

>>> (df1.groupby(['School', 'IsTopper'])).apply(lambda x:x.iloc[0])['OpenTime']

使用...iloc[1]获取下一个(第二个)值会在上述情况下抛出错误。

最后,我试图在以上示例的情况下获得以下输出:

      School    IsTopper      OpenTime   Next_OpenTime
0        DEF        1     08:00:00.000    09:30:05.908
1        ABC        1     09:00:00.000     
2        PQR        1     10:00:23.563    07:15:50.100 
3        DEF        1     09:30:05.908    
4        PQR        1     07:15:50.100    
5        PQR       -1     08:15:00.000     

1 个答案:

答案 0 :(得分:0)

>>> df1['Next_OpenTime'] = (df1.groupby(['School', 'IsTopper']))['OpenTime'].shift(-1)
>>> df1
      IsTopper      OpenTime School Next_OpenTime
0            1  08:00:00.000    DEF  09:30:05.908
1            1  09:00:00.000    ABC           NaN
2            1  10:00:23.563    PQR  07:15:50.100
3            1  09:30:05.908    DEF           NaN
4            1  07:15:50.100    PQR           NaN
5           -1  08:15:00.000    PQR           NaN