我有一个包含四列的数据框:id1,id2,age,stime。例如
df = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16')],
[2, 1, 10, pd.to_datetime('2020-01-27 00:20:20')],
[3, 1, 60, pd.to_datetime('2020-01-26 00:10:08')],
[4, 2, 1, pd.to_datetime('2020-01-13 00:20:19')],
[5, 2, 2, pd.to_datetime('2020-01-12 00:40:17')],
[6, 2, 3, pd.to_datetime('2020-01-10 00:10:53')],
[7, 3, 20, pd.to_datetime('2020-01-21 00:20:57')],
[8, 3, 40, pd.to_datetime('2020-01-20 00:10:38')],
[9, 3, 60, pd.to_datetime('2020-01-01 00:30:38')],
]),
columns=['id1', 'id2', 'age', 'stime'])
我想添加一列,该列的值是age的最大值,该列也具有匹配的id2,并且在该行的stime的最后2周内。所以对于上面的例子,我想得到
df2 = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16'), 3],
[2, 1, 10, pd.to_datetime('2020-01-27 00:20:20'), 60],
[3, 1, 60, pd.to_datetime('2020-01-26 00:10:08'), 60],
[4, 2, 1, pd.to_datetime('2020-01-13 00:20:19'), 3],
[5, 2, 2, pd.to_datetime('2020-01-12 00:40:17'), 3],
[6, 2, 3, pd.to_datetime('2020-01-10 00:10:53'), 3],
[7, 3, 20, pd.to_datetime('2020-01-21 00:20:57'), 40],
[8, 3, 40, pd.to_datetime('2020-01-20 00:10:38'), 40],
[9, 3, 60, pd.to_datetime('2020-01-01 00:30:38'), 60]
]),
columns=['id1', 'id2', 'age', 'stime', 'max_age_last_2w'])
由于我要执行的df很大,因此非常感谢您提供有关如何有效执行此操作的帮助-预先感谢!
答案 0 :(得分:0)
尝试:
df['max_age_last_2w'] = df.groupby(['id2', pd.Grouper(key='stime', freq='2W', closed='right')])['age'].transform('max')
输出:
id1 id2 age stime max_age_last_2w
0 1 1 3 2020-01-10 00:30:16 3
1 2 1 10 2020-01-27 00:20:20 60
2 3 1 60 2020-01-26 00:10:08 60
3 4 2 1 2020-01-13 00:20:19 3
4 5 2 2 2020-01-12 00:40:17 3
5 6 2 3 2020-01-10 00:10:53 3
6 7 3 20 2020-01-21 00:20:57 40
7 8 3 40 2020-01-20 00:10:38 40
8 9 3 60 2020-01-01 00:30:38 60