熊猫:根据条件添加基于groupby的列

时间:2020-02-20 07:08:45

标签: python pandas

我有一个包含四列的数据框:id1,id2,age,stime。例如

df = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16')], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20')], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08')],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19')], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17')],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53')], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57')],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38')], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38')],
                       ]),
                       columns=['id1', 'id2', 'age', 'stime'])

我想添加一列,该列的值是age的最大值,该列也具有匹配的id2,并且在该行的stime的最后2周内。所以对于上面的例子,我想得到

df2 = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16'), 3], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20'), 60], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08'), 60],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19'), 3], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17'), 3],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53'), 3], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57'), 40],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38'), 40], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38'), 60]
                       ]),
                       columns=['id1', 'id2', 'age', 'stime', 'max_age_last_2w'])

由于我要执行的df很大,因此非常感谢您提供有关如何有效执行此操作的帮助-预先感谢!

1 个答案:

答案 0 :(得分:0)

尝试:

df['max_age_last_2w'] = df.groupby(['id2', pd.Grouper(key='stime', freq='2W', closed='right')])['age'].transform('max')

输出:

  id1 id2 age               stime  max_age_last_2w
0   1   1   3 2020-01-10 00:30:16                3
1   2   1  10 2020-01-27 00:20:20               60
2   3   1  60 2020-01-26 00:10:08               60
3   4   2   1 2020-01-13 00:20:19                3
4   5   2   2 2020-01-12 00:40:17                3
5   6   2   3 2020-01-10 00:10:53                3
6   7   3  20 2020-01-21 00:20:57               40
7   8   3  40 2020-01-20 00:10:38               40
8   9   3  60 2020-01-01 00:30:38               60