如何填充到大熊猫的一天结束?

时间:2013-07-01 17:35:21

标签: python pandas

我有一个pandas.TimeSeries的日内索引。如何单独填充(向前填充)每天的NaN值?

例如,这个系列:

2013-03-27 22:07:00-04:00     1.0
2013-03-27 22:08:00-04:00     nan
2013-03-27 22:09:00-04:00     nan
2013-03-28 02:08:00-04:00     nan
2013-03-28 02:09:00-04:00     1.0

会变成:

2013-03-27 22:07:00-04:00     1.0
2013-03-27 22:08:00-04:00     1.0
2013-03-27 22:09:00-04:00     1.0
2013-03-28 02:08:00-04:00     nan
2013-03-28 02:09:00-04:00     1.0

我尝试使用groupby(pd.TimeGrouper('D')).apply(pd.Series.ffill)失败了。

2 个答案:

答案 0 :(得分:1)

以每天分钟的频率创建一组填充值

In [120]: idx = [ pd.date_range('20130101',periods=10,freq='T')+timedelta(i) for i in range(5) ]

In [121]: idx = idx[0] + idx[1] + idx[2] + idx[3] + idx[4]

In [122]: s = Series(randn(len(idx)),index=idx)

In [123]: s
Out[123]: 
2013-01-01 00:00:00    1.285575
2013-01-01 00:01:00    1.056882
2013-01-01 00:02:00   -0.690855
2013-01-01 00:03:00    1.235476
2013-01-01 00:04:00   -0.729948
2013-01-01 00:05:00    0.114036
2013-01-01 00:06:00    0.994977
2013-01-01 00:07:00   -0.455242
2013-01-01 00:08:00    0.645815
2013-01-01 00:09:00   -0.738772
2013-01-02 00:00:00    0.464686
2013-01-02 00:01:00   -0.872786
2013-01-02 00:02:00    0.112433
2013-01-02 00:03:00   -0.398235
2013-01-02 00:04:00   -0.442196
2013-01-02 00:05:00    0.634600
2013-01-02 00:06:00    1.165122
2013-01-02 00:07:00   -0.182570
2013-01-02 00:08:00   -0.107421
2013-01-02 00:09:00    0.033805
2013-01-03 00:00:00    1.768149
2013-01-03 00:01:00    0.218851
2013-01-03 00:02:00   -0.987624
2013-01-03 00:03:00   -1.258789
2013-01-03 00:04:00    0.984116
2013-01-03 00:05:00    1.859562
2013-01-03 00:06:00    1.620295
2013-01-03 00:07:00   -0.770468
2013-01-03 00:08:00   -1.263478
2013-01-03 00:09:00    0.036137
2013-01-04 00:00:00   -0.352919
2013-01-04 00:01:00    2.322247
2013-01-04 00:02:00   -1.218937
2013-01-04 00:03:00    0.619235
2013-01-04 00:04:00    0.019281
2013-01-04 00:05:00    1.689068
2013-01-04 00:06:00   -2.387880
2013-01-04 00:07:00    0.292372
2013-01-04 00:08:00    1.623110
2013-01-04 00:09:00   -1.944163
2013-01-05 00:00:00    0.403270
2013-01-05 00:01:00    1.750783
2013-01-05 00:02:00    0.485829
2013-01-05 00:03:00    0.957498
2013-01-05 00:04:00    0.018820
2013-01-05 00:05:00   -0.024910
2013-01-05 00:06:00    0.668174
2013-01-05 00:07:00   -1.104239
2013-01-05 00:08:00   -0.678914
2013-01-05 00:09:00    0.775712
dtype: float64

再加1天(纳米值)

In [124]: s = s.append(Series(np.nan,index=[Timestamp(s.index[-1].date())+timedelta(1)]))

In [125]: s
Out[125]: 
2013-01-01 00:00:00    1.285575
2013-01-01 00:01:00    1.056882
2013-01-01 00:02:00   -0.690855
2013-01-01 00:03:00    1.235476
2013-01-01 00:04:00   -0.729948
2013-01-01 00:05:00    0.114036
2013-01-01 00:06:00    0.994977
2013-01-01 00:07:00   -0.455242
2013-01-01 00:08:00    0.645815
2013-01-01 00:09:00   -0.738772
2013-01-02 00:00:00    0.464686
2013-01-02 00:01:00   -0.872786
2013-01-02 00:02:00    0.112433
2013-01-02 00:03:00   -0.398235
2013-01-02 00:04:00   -0.442196
2013-01-02 00:05:00    0.634600
2013-01-02 00:06:00    1.165122
2013-01-02 00:07:00   -0.182570
2013-01-02 00:08:00   -0.107421
2013-01-02 00:09:00    0.033805
2013-01-03 00:00:00    1.768149
2013-01-03 00:01:00    0.218851
2013-01-03 00:02:00   -0.987624
2013-01-03 00:03:00   -1.258789
2013-01-03 00:04:00    0.984116
2013-01-03 00:05:00    1.859562
2013-01-03 00:06:00    1.620295
2013-01-03 00:07:00   -0.770468
2013-01-03 00:08:00   -1.263478
2013-01-03 00:09:00    0.036137
2013-01-04 00:00:00   -0.352919
2013-01-04 00:01:00    2.322247
2013-01-04 00:02:00   -1.218937
2013-01-04 00:03:00    0.619235
2013-01-04 00:04:00    0.019281
2013-01-04 00:05:00    1.689068
2013-01-04 00:06:00   -2.387880
2013-01-04 00:07:00    0.292372
2013-01-04 00:08:00    1.623110
2013-01-04 00:09:00   -1.944163
2013-01-05 00:00:00    0.403270
2013-01-05 00:01:00    1.750783
2013-01-05 00:02:00    0.485829
2013-01-05 00:03:00    0.957498
2013-01-05 00:04:00    0.018820
2013-01-05 00:05:00   -0.024910
2013-01-05 00:06:00    0.668174
2013-01-05 00:07:00   -1.104239
2013-01-05 00:08:00   -0.678914
2013-01-05 00:09:00    0.775712
2013-01-06 00:00:00         NaN
Length: 51, dtype: float64

以相同的频率重新取样(我们添加的额外一天制作了这个打击垫 到我们想要的最后一天结束时

In [126]: s.resample('T',fill_method='pad')
2013-01-01 00:00:00    1.285575
2013-01-01 00:01:00    1.056882
2013-01-01 00:02:00   -0.690855
2013-01-01 00:03:00    1.235476
2013-01-01 00:04:00   -0.729948
2013-01-01 00:05:00    0.114036
2013-01-01 00:06:00    0.994977
2013-01-01 00:07:00   -0.455242
2013-01-01 00:08:00    0.645815
2013-01-01 00:09:00   -0.738772
2013-01-01 00:10:00   -0.738772
2013-01-01 00:11:00   -0.738772
2013-01-01 00:12:00   -0.738772
2013-01-01 00:13:00   -0.738772
2013-01-01 00:14:00   -0.738772
...
2013-01-05 23:46:00    0.775712
2013-01-05 23:47:00    0.775712
2013-01-05 23:48:00    0.775712
2013-01-05 23:49:00    0.775712
2013-01-05 23:50:00    0.775712
2013-01-05 23:51:00    0.775712
2013-01-05 23:52:00    0.775712
2013-01-05 23:53:00    0.775712
2013-01-05 23:54:00    0.775712
2013-01-05 23:55:00    0.775712
2013-01-05 23:56:00    0.775712
2013-01-05 23:57:00    0.775712
2013-01-05 23:58:00    0.775712
2013-01-05 23:59:00    0.775712
2013-01-06 00:00:00    0.775712
Freq: T, Length: 7201, dtype: float64

答案 1 :(得分:0)

仅填写NaN,直至每个结束日期:

series.groupby(pd.TimeGrouper('D')).apply(pd.Series.ffill)