我想添加一个列,该列是从每个customer_id的最大日期减去最小日期到此表的结果
输入:
action_date customer_id
2017-08-15 1
2017-08-21 1
2017-08-21 1
2017-09-02 1
2017-08-28 2
2017-09-29 2
2017-10-15 3
2017-10-30 3
2017-12-05 3
获取此表
输出:
action_date customer_id diff
2017-08-15 1 18
2017-08-21 1 18
2017-08-21 1 18
2017-09-02 1 18
2017-08-28 2 32
2017-09-29 2 32
2017-10-15 3 51
2017-10-30 3 51
2017-12-05 3 51
我尝试了这段代码,但它放了很多NaN的
group = df.groupby(by='customer_id')
df['diff'] = (group['action_date'].max() - group['action_date'].min()).dt.days
答案 0 :(得分:8)
您可以使用transform
方法:
In [23]: df['diff'] = df.groupby('customer_id') \
['action_date'] \
.transform(lambda x: (x.max()-x.min()).days)
In [24]: df
Out[24]:
action_date customer_id diff
0 2017-08-15 1 18
1 2017-08-21 1 18
2 2017-08-21 1 18
3 2017-09-02 1 18
4 2017-08-28 2 32
5 2017-09-29 2 32
6 2017-10-15 3 51
7 2017-10-30 3 51
8 2017-12-05 3 51