Reindex pandas DataFrame用于填充缺少的日期

时间:2017-07-17 13:13:05

标签: python pandas reindex

我在pandas DataFrame df中有每日数据,但缺少某些天(例如下面的1980-12-25)。我想重新索引DataFrame以添加具有NaN值的日期。

           date  close
None                  
0    1980-12-12  28.75
1    1980-12-15  27.25
2    1980-12-16  25.25
3    1980-12-17  25.87
4    1980-12-18  26.63
5    1980-12-19  28.25
6    1980-12-22  29.63
7    1980-12-23  30.88
8    1980-12-24  32.50
9    1980-12-26  35.50 

我已使用我想要的完整日期生成了列表dates

[Timestamp('1980-12-12 00:00:00'), Timestamp('1980-12-15 00:00:00'), Timestamp('1980-12-16 00:00:00'), Timestamp('1980-12-17 00:00:00'), Timestamp('1980-12-18 00:00:00'), Timestamp('1980-12-19 00:00:00'), Timestamp('1980-12-22 00:00:00'), Timestamp('1980-12-23 00:00:00'), Timestamp('1980-12-24 00:00:00'), Timestamp('1980-12-25 00:00:00'), Timestamp('1980-12-26 00:00:00')]

不幸的是,当我在下面运行reindex命令时,表格完全被NaN填充。

df.reindex(dates)

我跑了下面的检查,所有检查都很好......

>>> type(df['date'][0])
<class 'pandas._libs.tslib.Timestamp'>

>>> type(dates[0])
<class 'pandas._libs.tslib.Timestamp'>

>>> dates[0] == df['date'][0]
True

1 个答案:

答案 0 :(得分:1)

从我在您的问题中看到的内容,您需要set_index()

df
         date  close
0  1980-12-12  28.75
1  1980-12-15  27.25
2  1980-12-16  25.25
3  1980-12-17  25.87
4  1980-12-18  26.63
5  1980-12-19  28.25
6  1980-12-22  29.63
7  1980-12-23  30.88
8  1980-12-24  32.50
9  1980-12-26  35.50

df['date']  = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.reindex(dates)

df
            close
date             
1980-12-12  28.75
1980-12-15  27.25
1980-12-16  25.25
1980-12-17  25.87
1980-12-18  26.63
1980-12-19  28.25
1980-12-22  29.63
1980-12-23  30.88
1980-12-24  32.50
1980-12-25    NaN
1980-12-26  35.50

您需要设置索引,以便它知道如何对齐新索引。这是你的预期产量吗?