使用pandas重新采样数据

时间:2013-10-31 15:39:31

标签: python pandas

我的初始data.head()有结果:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45993 entries, 2009-11-17 14:14:00 to 2012-12-16 14:26:00
Data columns (total 4 columns):
rain         45993  non-null values
temp         45993  non-null values
windspeed    45993  non-null values
dew_point    45993  non-null values
dtypes: float64(4)

2009-11-17 14:14:00  0   22.5    4.9     12.3
2009-11-17 14:44:00  0   22.3    6.1     12.1
2009-11-17 15:14:00  0   22.1    5.3     12.5
2009-11-17 15:44:00  0   22.2    3.3     12.0
2009-11-17 16:14:00  0   20.4    4.9     11.7

当我重新采样时:

data = data.resample('30min', how ='sum')
data.head()

我得到:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68861 entries, 2009-01-12 00:00:00 to 2012-12-16 14:00:00
Freq: 30T
Data columns (total 4 columns):
rain         45987  non-null values
temp         45987  non-null values
windspeed    45987  non-null values
dew_point    45987  non-null values
dtypes: float64(4)

2009-01-12 00:00:002     0   17.4    7.1     14.6
2009-01-12 00:30:00  0   17.4    7.2     14.7
2009-01-12 01:00:00  0   18.0    10.5    14.3
2009-01-12 01:30:00  0   18.3    9.6     14.2
2009-01-12 02:00:00  0   18.4    10.8    14.8

如您所见,我的初始日期是2009-11-17 14:14:00,但重新采样日从2009-01-12开始。任何人都可以解释这种情况吗?

编辑,我确实发现了问题,所以对其他人来说 提供的数据集包含:

2009-01-12 00:00:00 value
2009-01-12 00:30:00 value ... but the next line was!!!!!
2009-01-12 01:00    value

所以失踪:00秒造成了所有混乱

0 个答案:

没有答案