Question

我有以下熊猫数据框：

    Circuit-ID  DATETIME    LATE? 
78899   07/06/2018 15:30    1
78899   08/06/2018 17:30    0
78899   09/06/2018 20:30    1
23544   12/07/2017 23:30    1
23544   13/07/2017 19:30    0
23544   14/07/2017 20:30    1

我需要计算DATETIME和LATE的移位值吗？列以获得以下结果：

Circuit DATETIME          LATE?     DATETIME-1        LATE-1    
78899   07/06/2018 15:30    1   NA                    NA
78899   08/06/2018 17:30    0   07/06/2018 15:30       1
78899   09/06/2018 20:30    1   08/06/2018 17:30       0
23544   12/07/2017 23:30    1   NA                    NA
23544   13/07/2017 19:30    0   12/07/2017 23:30       1
23544   14/07/2017 20:30    1   13/07/2017 19:30       0

我尝试了以下代码：

df.groupby(['circuit ID, DATETILE', LATE? ]) \
            .apply(lambda x : x.sort_values(by=['circuit ID, 'DATETILE', 'LATE?'], ascending = [True, True, True]))['LATE?'] \
            .transform(lambda x:x.shift()) \
            .reset_index(name= 'LATE-1')

但是在某些行中，我一直得到错误的结果，这些地方的第一个移位值不同于Nan。您能否指出一种更干净的方法以获得期望的结果？

Answer 1

使用groupby和shift，然后重新加入：

df.join(df.groupby('Circuit-ID').shift().add_suffix('-1'))

   Circuit-ID          DATETIME  LATE?        DATETIME-1  LATE?-1
0       78899  07/06/2018 15:30      1               NaN      NaN
1       78899  08/06/2018 17:30      0  07/06/2018 15:30      1.0
2       78899  09/06/2018 20:30      1  08/06/2018 17:30      0.0
3       23544  12/07/2017 23:30      1               NaN      NaN
4       23544  13/07/2017 19:30      0  12/07/2017 23:30      1.0
5       23544  14/07/2017 20:30      1  13/07/2017 19:30      0.0

类似的解决方案使用concat进行加入：

pd.concat([df, df.groupby('Circuit-ID').shift().add_suffix('-1')], axis=1)

   Circuit-ID          DATETIME  LATE?        DATETIME-1  LATE?-1
0       78899  07/06/2018 15:30      1               NaN      NaN
1       78899  08/06/2018 17:30      0  07/06/2018 15:30      1.0
2       78899  09/06/2018 20:30      1  08/06/2018 17:30      0.0
3       23544  12/07/2017 23:30      1               NaN      NaN
4       23544  13/07/2017 19:30      0  12/07/2017 23:30      1.0
5       23544  14/07/2017 20:30      1  13/07/2017 19:30      0.0

如何在Python Pandas中计算组上的移位列

1 个答案: