Pandas用重新采样推断数据

时间:2015-08-17 19:06:00

标签: python pandas resampling

我有pandas.DataFrame,如下所示

                          53.0       79.3
%Y-%m-%d %H:%M:%S                        
2013-05-16 16:01:30        NaN        NaN
2013-05-16 16:02:00        NaN        NaN
2013-05-16 16:03:30        NaN        NaN
2013-05-16 16:04:00        NaN        NaN
2013-05-16 16:05:30        NaN        NaN
2013-05-16 16:06:00        NaN        NaN
2013-05-16 16:07:30        NaN        NaN
2013-05-16 16:08:00        NaN        NaN
2013-05-16 16:09:30        NaN        NaN
2013-05-16 16:10:00        NaN        NaN
2013-05-16 16:11:30        NaN        NaN
2013-05-16 16:12:00        NaN        NaN
2013-05-16 16:13:30  17.547750        NaN
2013-05-16 16:14:00  17.582850        NaN
2013-05-16 16:15:30  17.577798  17.617950
...                        ...        ...
2013-12-31 23:43:30  17.944316  17.896369
2013-12-31 23:44:00  17.946537  17.899142
2013-12-31 23:45:30  17.953200  17.907460
2013-12-31 23:46:00  17.953200  17.910232
2013-12-31 23:47:30  18.008928  17.918550
2013-12-31 23:48:00  18.027504  17.901450
2013-12-31 23:49:30  18.083232        NaN
2013-12-31 23:51:00  18.138960        NaN
2013-12-31 23:52:30  18.194688        NaN
2013-12-31 23:54:00  18.250416        NaN
2013-12-31 23:54:30  18.268992        NaN
2013-12-31 23:55:00  18.287568        NaN
2013-12-31 23:55:30  18.306144        NaN
2013-12-31 23:57:00  18.361872        NaN
2013-12-31 23:58:30  18.417600        NaN

我试图将其重新采样到例如5分钟的频率。所以我正在做concs.resample('5min', how='mean')。但是,我得到的是从2013-01-06而不是原始数据的第一个日期开始的数据:

                          53.0       79.3
%Y-%m-%d %H:%M:%S                        
2013-01-06 00:00:00        NaN        NaN
2013-01-06 00:05:00        NaN        NaN
2013-01-06 00:10:00  17.743950        NaN
2013-01-06 00:15:00  17.762441  17.688170
2013-01-06 00:20:00  17.789896  17.677440
2013-01-06 00:25:00  17.818473  17.666039
2013-01-06 00:30:00  17.840941  17.667581
2013-01-06 00:35:00  17.823765  17.673750
2013-01-06 00:40:00  17.807264  17.673750
2013-01-06 00:45:00  17.755974  17.701222
2013-01-06 00:50:00  17.798940  17.737088
2013-01-06 00:55:00  17.849160  17.730675
2013-01-06 01:00:00  17.865900  17.726400
2013-01-06 01:05:00  17.865900  17.726400
2013-01-06 01:10:00  17.869410  17.726400
...                        ...        ...
2013-12-31 22:45:00  17.852065  17.831828
2013-12-31 22:50:00  17.859726  17.832600
2013-12-31 22:55:00  17.864514  17.832600
2013-12-31 23:00:00  17.875686  17.835091
2013-12-31 23:05:00  17.888259  17.841335
2013-12-31 23:10:00  17.901678  17.846414
2013-12-31 23:15:00  17.911269  17.848565
2013-12-31 23:20:00  17.899611  17.842696
2013-12-31 23:25:00  17.890050  17.837790
2013-12-31 23:30:00  17.894122  17.836821
2013-12-31 23:35:00  17.916776  17.861989
2013-12-31 23:40:00  17.936701  17.886863
2013-12-31 23:45:00  18.005213  17.909423
2013-12-31 23:50:00  18.213264        NaN
2013-12-31 23:55:00  18.343296        NaN

这实际上不是我预期的行为。我希望重采样数据从2013-05-16 16:00:00开始。同样奇怪的是,如果我只得到数据的一个子集,它就会按预期工作。例如,如果我尝试concs[:50].resample('5min', how='mean'),那么我得到了我的预期,即

                          53.0       79.3
%Y-%m-%d %H:%M:%S                        
2013-05-16 16:00:00        NaN        NaN
2013-05-16 16:05:00        NaN        NaN
2013-05-16 16:10:00  17.565300        NaN
2013-05-16 16:15:00  17.571736  17.600740
2013-05-16 16:20:00  17.555235  17.586360
2013-05-16 16:25:00  17.538059  17.574811
2013-05-16 16:30:00  17.488318  17.540454
2013-05-16 16:35:00  17.430869  17.494798
2013-05-16 16:40:00  17.375673  17.461652
2013-05-16 16:45:00  17.398562  17.424820
2013-05-16 16:50:00  17.390880  17.421300

我在这里遗漏了什么吗?我使用的是0.15版本。

修改

它似乎与数据所包含的大量行有关。目前它有128459行。如果我concs.iloc[:100000].resample('5min'),问题仍然存在。如果我concs.iloc[:10000].resample('5min'),问题似乎就消失了。

0 个答案:

没有答案