我希望在按某些列对其进行分组后,从给定的数据框中获取下一个(第二个)条目。如果这些中的任何一个不存在,那么它应该根据时间返回nan / nat。请考虑以下示例:
>>> df1 = pd.DataFrame({'School': {0: 'DEF', 1: 'ABC', 2: 'PQR', 3: 'DEF', 4: 'PQR', 5: 'PQR'}, 'OpenTime': {0: '08:00:00.000', 1: '09:00:00.000', 2: '10:00:23.563', 3: '09:30:05.908', 4: '07:15:50.100', 5: '08:15:00.000'}, 'CloseTime': {0: '13:00:00.000', 1: '14:00:00.000', 2: '13:30:00.100', 3: '15:00:00.768', 4: '13:00:00.500', 5: '15:50:32.534'}, 'IsTopper':{0:'1',1:'1',2:'1',3:'1',4:'1',5:'-1'}})
>>> df1
CloseTime IsTopper OpenTime School
0 13:00:00.000 1 08:00:00.000 DEF
1 14:00:00.000 1 09:00:00.000 ABC
2 13:30:00.100 1 10:00:23.563 PQR
3 15:00:00.768 1 09:30:05.908 DEF
4 13:00:00.500 1 07:15:50.100 PQR
5 15:50:32.534 -1 08:15:00.000 PQR
获得第一个值很简单,可以通过以下任一方式实现
>>> df1.groupby(['School', 'IsTopper'])['OpenTime'].first()
OR
>>> (df1.groupby(['School', 'IsTopper'])).apply(lambda x:x.iloc[0])['OpenTime']
使用...iloc[1]
获取下一个(第二个)值会在上述情况下抛出错误。
最后,我试图在以上示例的情况下获得以下输出:
School IsTopper OpenTime Next_OpenTime
0 DEF 1 08:00:00.000 09:30:05.908
1 ABC 1 09:00:00.000
2 PQR 1 10:00:23.563 07:15:50.100
3 DEF 1 09:30:05.908
4 PQR 1 07:15:50.100
5 PQR -1 08:15:00.000
答案 0 :(得分:0)
>>> df1['Next_OpenTime'] = (df1.groupby(['School', 'IsTopper']))['OpenTime'].shift(-1)
>>> df1
IsTopper OpenTime School Next_OpenTime
0 1 08:00:00.000 DEF 09:30:05.908
1 1 09:00:00.000 ABC NaN
2 1 10:00:23.563 PQR 07:15:50.100
3 1 09:30:05.908 DEF NaN
4 1 07:15:50.100 PQR NaN
5 -1 08:15:00.000 PQR NaN