我的初始data.head()
有结果:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45993 entries, 2009-11-17 14:14:00 to 2012-12-16 14:26:00
Data columns (total 4 columns):
rain 45993 non-null values
temp 45993 non-null values
windspeed 45993 non-null values
dew_point 45993 non-null values
dtypes: float64(4)
2009-11-17 14:14:00 0 22.5 4.9 12.3
2009-11-17 14:44:00 0 22.3 6.1 12.1
2009-11-17 15:14:00 0 22.1 5.3 12.5
2009-11-17 15:44:00 0 22.2 3.3 12.0
2009-11-17 16:14:00 0 20.4 4.9 11.7
当我重新采样时:
data = data.resample('30min', how ='sum')
data.head()
我得到:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68861 entries, 2009-01-12 00:00:00 to 2012-12-16 14:00:00
Freq: 30T
Data columns (total 4 columns):
rain 45987 non-null values
temp 45987 non-null values
windspeed 45987 non-null values
dew_point 45987 non-null values
dtypes: float64(4)
2009-01-12 00:00:002 0 17.4 7.1 14.6
2009-01-12 00:30:00 0 17.4 7.2 14.7
2009-01-12 01:00:00 0 18.0 10.5 14.3
2009-01-12 01:30:00 0 18.3 9.6 14.2
2009-01-12 02:00:00 0 18.4 10.8 14.8
如您所见,我的初始日期是2009-11-17 14:14:00,但重新采样日从2009-01-12开始。任何人都可以解释这种情况吗?
编辑,我确实发现了问题,所以对其他人来说 提供的数据集包含:
2009-01-12 00:00:00 value
2009-01-12 00:30:00 value ... but the next line was!!!!!
2009-01-12 01:00 value
所以失踪:00秒造成了所有混乱