我有一个pandas.TimeSeries
的日内索引。如何单独填充(向前填充)每天的NaN值?
例如,这个系列:
2013-03-27 22:07:00-04:00 1.0
2013-03-27 22:08:00-04:00 nan
2013-03-27 22:09:00-04:00 nan
2013-03-28 02:08:00-04:00 nan
2013-03-28 02:09:00-04:00 1.0
会变成:
2013-03-27 22:07:00-04:00 1.0
2013-03-27 22:08:00-04:00 1.0
2013-03-27 22:09:00-04:00 1.0
2013-03-28 02:08:00-04:00 nan
2013-03-28 02:09:00-04:00 1.0
我尝试使用groupby(pd.TimeGrouper('D')).apply(pd.Series.ffill)
失败了。
答案 0 :(得分:1)
以每天分钟的频率创建一组填充值
In [120]: idx = [ pd.date_range('20130101',periods=10,freq='T')+timedelta(i) for i in range(5) ]
In [121]: idx = idx[0] + idx[1] + idx[2] + idx[3] + idx[4]
In [122]: s = Series(randn(len(idx)),index=idx)
In [123]: s
Out[123]:
2013-01-01 00:00:00 1.285575
2013-01-01 00:01:00 1.056882
2013-01-01 00:02:00 -0.690855
2013-01-01 00:03:00 1.235476
2013-01-01 00:04:00 -0.729948
2013-01-01 00:05:00 0.114036
2013-01-01 00:06:00 0.994977
2013-01-01 00:07:00 -0.455242
2013-01-01 00:08:00 0.645815
2013-01-01 00:09:00 -0.738772
2013-01-02 00:00:00 0.464686
2013-01-02 00:01:00 -0.872786
2013-01-02 00:02:00 0.112433
2013-01-02 00:03:00 -0.398235
2013-01-02 00:04:00 -0.442196
2013-01-02 00:05:00 0.634600
2013-01-02 00:06:00 1.165122
2013-01-02 00:07:00 -0.182570
2013-01-02 00:08:00 -0.107421
2013-01-02 00:09:00 0.033805
2013-01-03 00:00:00 1.768149
2013-01-03 00:01:00 0.218851
2013-01-03 00:02:00 -0.987624
2013-01-03 00:03:00 -1.258789
2013-01-03 00:04:00 0.984116
2013-01-03 00:05:00 1.859562
2013-01-03 00:06:00 1.620295
2013-01-03 00:07:00 -0.770468
2013-01-03 00:08:00 -1.263478
2013-01-03 00:09:00 0.036137
2013-01-04 00:00:00 -0.352919
2013-01-04 00:01:00 2.322247
2013-01-04 00:02:00 -1.218937
2013-01-04 00:03:00 0.619235
2013-01-04 00:04:00 0.019281
2013-01-04 00:05:00 1.689068
2013-01-04 00:06:00 -2.387880
2013-01-04 00:07:00 0.292372
2013-01-04 00:08:00 1.623110
2013-01-04 00:09:00 -1.944163
2013-01-05 00:00:00 0.403270
2013-01-05 00:01:00 1.750783
2013-01-05 00:02:00 0.485829
2013-01-05 00:03:00 0.957498
2013-01-05 00:04:00 0.018820
2013-01-05 00:05:00 -0.024910
2013-01-05 00:06:00 0.668174
2013-01-05 00:07:00 -1.104239
2013-01-05 00:08:00 -0.678914
2013-01-05 00:09:00 0.775712
dtype: float64
再加1天(纳米值)
In [124]: s = s.append(Series(np.nan,index=[Timestamp(s.index[-1].date())+timedelta(1)]))
In [125]: s
Out[125]:
2013-01-01 00:00:00 1.285575
2013-01-01 00:01:00 1.056882
2013-01-01 00:02:00 -0.690855
2013-01-01 00:03:00 1.235476
2013-01-01 00:04:00 -0.729948
2013-01-01 00:05:00 0.114036
2013-01-01 00:06:00 0.994977
2013-01-01 00:07:00 -0.455242
2013-01-01 00:08:00 0.645815
2013-01-01 00:09:00 -0.738772
2013-01-02 00:00:00 0.464686
2013-01-02 00:01:00 -0.872786
2013-01-02 00:02:00 0.112433
2013-01-02 00:03:00 -0.398235
2013-01-02 00:04:00 -0.442196
2013-01-02 00:05:00 0.634600
2013-01-02 00:06:00 1.165122
2013-01-02 00:07:00 -0.182570
2013-01-02 00:08:00 -0.107421
2013-01-02 00:09:00 0.033805
2013-01-03 00:00:00 1.768149
2013-01-03 00:01:00 0.218851
2013-01-03 00:02:00 -0.987624
2013-01-03 00:03:00 -1.258789
2013-01-03 00:04:00 0.984116
2013-01-03 00:05:00 1.859562
2013-01-03 00:06:00 1.620295
2013-01-03 00:07:00 -0.770468
2013-01-03 00:08:00 -1.263478
2013-01-03 00:09:00 0.036137
2013-01-04 00:00:00 -0.352919
2013-01-04 00:01:00 2.322247
2013-01-04 00:02:00 -1.218937
2013-01-04 00:03:00 0.619235
2013-01-04 00:04:00 0.019281
2013-01-04 00:05:00 1.689068
2013-01-04 00:06:00 -2.387880
2013-01-04 00:07:00 0.292372
2013-01-04 00:08:00 1.623110
2013-01-04 00:09:00 -1.944163
2013-01-05 00:00:00 0.403270
2013-01-05 00:01:00 1.750783
2013-01-05 00:02:00 0.485829
2013-01-05 00:03:00 0.957498
2013-01-05 00:04:00 0.018820
2013-01-05 00:05:00 -0.024910
2013-01-05 00:06:00 0.668174
2013-01-05 00:07:00 -1.104239
2013-01-05 00:08:00 -0.678914
2013-01-05 00:09:00 0.775712
2013-01-06 00:00:00 NaN
Length: 51, dtype: float64
以相同的频率重新取样(我们添加的额外一天制作了这个打击垫 到我们想要的最后一天结束时
In [126]: s.resample('T',fill_method='pad')
2013-01-01 00:00:00 1.285575
2013-01-01 00:01:00 1.056882
2013-01-01 00:02:00 -0.690855
2013-01-01 00:03:00 1.235476
2013-01-01 00:04:00 -0.729948
2013-01-01 00:05:00 0.114036
2013-01-01 00:06:00 0.994977
2013-01-01 00:07:00 -0.455242
2013-01-01 00:08:00 0.645815
2013-01-01 00:09:00 -0.738772
2013-01-01 00:10:00 -0.738772
2013-01-01 00:11:00 -0.738772
2013-01-01 00:12:00 -0.738772
2013-01-01 00:13:00 -0.738772
2013-01-01 00:14:00 -0.738772
...
2013-01-05 23:46:00 0.775712
2013-01-05 23:47:00 0.775712
2013-01-05 23:48:00 0.775712
2013-01-05 23:49:00 0.775712
2013-01-05 23:50:00 0.775712
2013-01-05 23:51:00 0.775712
2013-01-05 23:52:00 0.775712
2013-01-05 23:53:00 0.775712
2013-01-05 23:54:00 0.775712
2013-01-05 23:55:00 0.775712
2013-01-05 23:56:00 0.775712
2013-01-05 23:57:00 0.775712
2013-01-05 23:58:00 0.775712
2013-01-05 23:59:00 0.775712
2013-01-06 00:00:00 0.775712
Freq: T, Length: 7201, dtype: float64
答案 1 :(得分:0)
仅填写NaN,直至每个结束日期:
series.groupby(pd.TimeGrouper('D')).apply(pd.Series.ffill)