在pandas中传播缺失日期的值

时间:2017-11-18 12:07:37

标签: python pandas

我有工作日的以下数据框

In [23]: d = pd.DataFrame({'date' : ['20070105', '20070105', '20070106', '20070106', '20070106', '20070109'], 's' : [1, 2, 1,2,3,1], 'i': ['a', 'b', 'a', 'b', 'c', 'a']})
In [26]: d['date'] = pd.to_datetime(d['date'], format='%Y%m%d')
In [27]: d
Out[27]: 
        date  i  s
0 2007-01-05  a  1
1 2007-01-05  b  2
2 2007-01-06  a  1
3 2007-01-06  b  2
4 2007-01-06  c  3
5 2007-01-09  a  1

我想填写缺少日期的数据(根据' alldays'日历),输出应如下所示。基本上20070107和20070108丢失了,其数据是从20070106复制的。

Out[31]: 
         date  i  s
0  2007-01-05  a  1
1  2007-01-05  b  2
2  2007-01-06  a  1
3  2007-01-06  b  2
4  2007-01-06  c  3
5  2007-01-07  a  1
6  2007-01-07  b  2
7  2007-01-07  c  3
8  2007-01-08  a  1
9  2007-01-08  b  2
10 2007-01-08  c  3
11 2007-01-09  a  1

在熊猫中这样做的最佳方式是什么?

1 个答案:

答案 0 :(得分:2)

使用:

    {li> set_index cumcount获取唯一MultiIndex
  • unstack重新设定唯一DatetimeIndex
  • reindex按索引中的最小值和最大值,按method='ffill'
  • 添加缺失值
  • 重塑stack
  • 上次删除第二级MultiIndex并将Datetimeindex转换为列
df = (
      d.set_index(['date', d.groupby('date').cumcount()])
        .unstack()
        .reindex(pd.date_range(d['date'].min(), d['date'].max()), method='ffill')
        .stack()
        .reset_index(level=1, drop=True)
        .rename_axis('date')
        .reset_index()
       )
print (df)

         date  i    s
0  2007-01-05  a  1.0
1  2007-01-05  b  2.0
2  2007-01-06  a  1.0
3  2007-01-06  b  2.0
4  2007-01-06  c  3.0
5  2007-01-07  a  1.0
6  2007-01-07  b  2.0
7  2007-01-07  c  3.0
8  2007-01-08  a  1.0
9  2007-01-08  b  2.0
10 2007-01-08  c  3.0
11 2007-01-09  a  1.0