我有pandas.DataFrame
,如下所示
53.0 79.3
%Y-%m-%d %H:%M:%S
2013-05-16 16:01:30 NaN NaN
2013-05-16 16:02:00 NaN NaN
2013-05-16 16:03:30 NaN NaN
2013-05-16 16:04:00 NaN NaN
2013-05-16 16:05:30 NaN NaN
2013-05-16 16:06:00 NaN NaN
2013-05-16 16:07:30 NaN NaN
2013-05-16 16:08:00 NaN NaN
2013-05-16 16:09:30 NaN NaN
2013-05-16 16:10:00 NaN NaN
2013-05-16 16:11:30 NaN NaN
2013-05-16 16:12:00 NaN NaN
2013-05-16 16:13:30 17.547750 NaN
2013-05-16 16:14:00 17.582850 NaN
2013-05-16 16:15:30 17.577798 17.617950
... ... ...
2013-12-31 23:43:30 17.944316 17.896369
2013-12-31 23:44:00 17.946537 17.899142
2013-12-31 23:45:30 17.953200 17.907460
2013-12-31 23:46:00 17.953200 17.910232
2013-12-31 23:47:30 18.008928 17.918550
2013-12-31 23:48:00 18.027504 17.901450
2013-12-31 23:49:30 18.083232 NaN
2013-12-31 23:51:00 18.138960 NaN
2013-12-31 23:52:30 18.194688 NaN
2013-12-31 23:54:00 18.250416 NaN
2013-12-31 23:54:30 18.268992 NaN
2013-12-31 23:55:00 18.287568 NaN
2013-12-31 23:55:30 18.306144 NaN
2013-12-31 23:57:00 18.361872 NaN
2013-12-31 23:58:30 18.417600 NaN
我试图将其重新采样到例如5分钟的频率。所以我正在做concs.resample('5min', how='mean')
。但是,我得到的是从2013-01-06
而不是原始数据的第一个日期开始的数据:
53.0 79.3
%Y-%m-%d %H:%M:%S
2013-01-06 00:00:00 NaN NaN
2013-01-06 00:05:00 NaN NaN
2013-01-06 00:10:00 17.743950 NaN
2013-01-06 00:15:00 17.762441 17.688170
2013-01-06 00:20:00 17.789896 17.677440
2013-01-06 00:25:00 17.818473 17.666039
2013-01-06 00:30:00 17.840941 17.667581
2013-01-06 00:35:00 17.823765 17.673750
2013-01-06 00:40:00 17.807264 17.673750
2013-01-06 00:45:00 17.755974 17.701222
2013-01-06 00:50:00 17.798940 17.737088
2013-01-06 00:55:00 17.849160 17.730675
2013-01-06 01:00:00 17.865900 17.726400
2013-01-06 01:05:00 17.865900 17.726400
2013-01-06 01:10:00 17.869410 17.726400
... ... ...
2013-12-31 22:45:00 17.852065 17.831828
2013-12-31 22:50:00 17.859726 17.832600
2013-12-31 22:55:00 17.864514 17.832600
2013-12-31 23:00:00 17.875686 17.835091
2013-12-31 23:05:00 17.888259 17.841335
2013-12-31 23:10:00 17.901678 17.846414
2013-12-31 23:15:00 17.911269 17.848565
2013-12-31 23:20:00 17.899611 17.842696
2013-12-31 23:25:00 17.890050 17.837790
2013-12-31 23:30:00 17.894122 17.836821
2013-12-31 23:35:00 17.916776 17.861989
2013-12-31 23:40:00 17.936701 17.886863
2013-12-31 23:45:00 18.005213 17.909423
2013-12-31 23:50:00 18.213264 NaN
2013-12-31 23:55:00 18.343296 NaN
这实际上不是我预期的行为。我希望重采样数据从2013-05-16 16:00:00
开始。同样奇怪的是,如果我只得到数据的一个子集,它就会按预期工作。例如,如果我尝试concs[:50].resample('5min', how='mean')
,那么我得到了我的预期,即
53.0 79.3
%Y-%m-%d %H:%M:%S
2013-05-16 16:00:00 NaN NaN
2013-05-16 16:05:00 NaN NaN
2013-05-16 16:10:00 17.565300 NaN
2013-05-16 16:15:00 17.571736 17.600740
2013-05-16 16:20:00 17.555235 17.586360
2013-05-16 16:25:00 17.538059 17.574811
2013-05-16 16:30:00 17.488318 17.540454
2013-05-16 16:35:00 17.430869 17.494798
2013-05-16 16:40:00 17.375673 17.461652
2013-05-16 16:45:00 17.398562 17.424820
2013-05-16 16:50:00 17.390880 17.421300
我在这里遗漏了什么吗?我使用的是0.15版本。
修改的
它似乎与数据所包含的大量行有关。目前它有128459行。如果我concs.iloc[:100000].resample('5min')
,问题仍然存在。如果我concs.iloc[:10000].resample('5min')
,问题似乎就消失了。