我有一个看起来像这样的数据框:
import pandas as pd
import numpy as np
d={'business':['FX','FX','FX','FX','IR','IR','IR','IR'],\
'A/L':['A','A','A','A','A','A','A','A'],\
'date':(['01/01/2018','02/01/2018','03/01/2018','04/01/2018',\
'05/01/2018','06/01/2018','06/01/2019','06/01/2020']),\
'amt':[1,2,3,4,5,np.nan,7,8]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df.set_index('date',inplace=True)
df=df.groupby('business').apply(pd.Series.interpolate)
df
我想对上述数据进行插值,但要在插值中包括日期。因此,考虑到两行之间存在1年的“差距”,我本来期望的数字不是当前的6,而是接近5。你知道怎么做吗?
答案 0 :(得分:2)
将“日期”列设置为索引后,您可以指定用于插值到index
的方法,例如:
print (df.set_index('date')
.groupby('business')
.apply(lambda x: x.interpolate(method = 'index'))
.reset_index())
date business A/L amt
0 2018-01-01 FX A 1.000000
1 2018-01-02 FX A 2.000000
2 2018-01-03 FX A 3.000000
3 2018-01-04 FX A 4.000000
4 2018-01-05 IR A 5.000000
5 2018-01-06 IR A 5.005464
6 2019-01-06 IR A 7.000000
7 2020-01-06 IR A 8.000000