假设我有以下数据
date id1 id2 category value
01/01/2019 1000 2000 income 1.0
01/01/2019 1000 2010 income 1.0
01/01/2019 1000 2000 expense 1.0
01/01/2019 1000 2010 expense 1.0
01/02/2019 1000 2000 income 2.0
01/02/2019 1000 2010 income 2.0
01/02/2019 1000 2000 expense 2.0
01/02/2019 1000 2010 expense 2.0
01/04/2019 1000 2000 income 3.0
01/04/2019 1000 2010 income 3.0
01/04/2019 1000 2000 expense 3.0
01/04/2019 1000 2010 expense 3.0
我想填写缺少的日期01/03/2019,但还要为id1,id2和类别的每个组合填写一行。因此,在我的情况下,将添加4行:
date id1 id2 category value
01/03/2019 1000 2000 income 2.0
01/03/2019 1000 2010 income 2.0
01/03/2019 1000 2000 expense 2.0
01/03/2019 1000 2010 expense 2.0
当这是唯一索引时,我对回填和正向填充日期很熟悉,但是上述特殊问题是通过组合来自多个列的值来回填,这给我带来了麻烦。有什么简单的方法可以使用熊猫吗?
答案 0 :(得分:2)
这是第一个关键问题,然后成为resample
和ffill
问题
df.date=pd.to_datetime(df.date)
df['key']=df.groupby('date').cumcount()
newdf=df.set_index(['date','key']).unstack().resample('D').mean().ffill().stack().reset_index(level=0)
newdf
date id1 id2 value
key
0 2019-01-01 1000.0 2000.0 1.0
1 2019-01-01 1000.0 2010.0 1.0
2 2019-01-01 1000.0 2000.0 1.0
3 2019-01-01 1000.0 2010.0 1.0
0 2019-01-02 1000.0 2000.0 2.0
1 2019-01-02 1000.0 2010.0 2.0
2 2019-01-02 1000.0 2000.0 2.0
3 2019-01-02 1000.0 2010.0 2.0
0 2019-01-03 1000.0 2000.0 2.0
1 2019-01-03 1000.0 2010.0 2.0
2 2019-01-03 1000.0 2000.0 2.0
3 2019-01-03 1000.0 2010.0 2.0
0 2019-01-04 1000.0 2000.0 3.0
1 2019-01-04 1000.0 2010.0 3.0
2 2019-01-04 1000.0 2000.0 3.0
3 2019-01-04 1000.0 2010.0 3.0