我有一个看起来像这样的数据框:
d={'business':['FX','FX','IR','IR'],\
'name':['ed','ed','a','b'],\
'date':(['01/01/2018','05/02/2018','01/01/2018','05/01/2018']),\
'amt':[1,2,3,4]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df
我正在尝试使用diff()函数获得一个显示两个日期之间差异的新列。我需要的最终输出是:
df['date diff']=[0,4,0,0]
注意:diff()函数将导致大于0的Nan正常。
答案 0 :(得分:1)
我相信您需要DataFrameGroupBy.diff
:
df['date diff'] = df.groupby(['business','name'])['amt'].diff().fillna(0).astype(int)
print(df)
business name date amt date diff
0 FX ed 2018-01-01 1 0
1 FX ed 2018-02-05 5 4
2 IR a 2018-01-01 101 0
3 IR b 2018-01-05 105 0
编辑:
df = df.sort_values(['business','date'])
df['date diff'] = df.groupby(['business'])['date'].diff().dt.days.fillna(0).astype(int)
print(df)
business name date amt date diff
0 FX ed 2018-01-01 1 0
1 FX ed 2018-02-05 5 35
2 IR a 2018-01-01 101 0
3 IR b 2018-01-05 105 4