我试图获得此DataFrame的每小时平均值(时间是指数):
Temp Pressure
Time
2016-01-01 12:30:00 13.8 1012.3
2016-01-01 01:00:00 13.6 1012.2
2016-01-01 01:30:00 14.5 1012.2
2016-01-01 02:00:00 15.2 1012.0
2016-01-01 02:30:00 14.4 1012.2
2016-01-01 03:00:00 15.1 1011.9
2016-01-01 03:30:00 14.9 1011.9
2016-01-01 04:00:00 15.2 1011.9
2016-01-01 04:30:00 14.9 1011.8
2016-01-01 05:00:00 14.1 1012.1
.......................
2016-04-11 10:30:00 20.3 1010.5
2016-04-11 11:00:00 20.3 1010.5
2016-04-11 11:30:00 20.2 1010.5
我的方法是使用重新采样:
df2 = df.resample('H').agg(['mean','std'])
但结果只是部分:
Temp Pressure
mean std mean std
Time
2016-01-01 01:00:00 13.150000 1.121011 1013.650000 1.674316
2016-01-01 02:00:00 13.200000 1.904381 1013.925000 2.112463
2016-01-01 03:00:00 13.625000 1.631717 1013.975000 2.404683
2016-01-01 04:00:00 13.700000 1.576917 1014.250000 2.786276
2016-01-01 05:00:00 12.925000 1.007886 1014.825000 2.869814
2016-01-01 06:00:00 12.425000 0.906918 1015.200000 2.965356
2016-01-01 07:00:00 12.475000 1.372042 1015.950000 3.074085
2016-01-01 08:00:00 11.950000 2.129945 1016.775000 3.221154
2016-01-01 09:00:00 11.875000 1.842779 1017.425000 3.105238
2016-01-01 10:00:00 12.025000 1.602862 1017.750000 2.950141
2016-01-01 11:00:00 11.475000 1.150000 1018.000000 3.119829
2016-01-01 12:00:00 13.066667 0.750555 1014.166667 1.619671
2016-01-01 13:00:00 NaN NaN NaN NaN
2016-01-01 14:00:00 NaN NaN NaN NaN
2016-01-01 15:00:00 NaN NaN NaN NaN
2016-01-01 16:00:00 NaN NaN NaN NaN
2016-01-01 17:00:00 NaN NaN NaN NaN
2016-01-01 18:00:00 NaN NaN NaN NaN
2016-01-01 19:00:00 NaN NaN NaN NaN
2016-01-01 20:00:00 NaN NaN NaN NaN
2016-01-01 21:00:00 NaN NaN NaN NaN
2016-01-01 22:00:00 NaN NaN NaN NaN
2016-01-01 23:00:00 NaN NaN NaN NaN
2016-01-02 00:00:00 NaN NaN NaN NaN
2016-01-02 01:00:00 12.325000 1.629673 1022.175000 1.820943
2016-01-02 02:00:00 12.350000 1.968925 1022.375000 1.588238
2016-01-02 03:00:00 12.250000 0.974679 1022.375000 1.819112
2016-01-02 04:00:00 12.025000 0.994569 1022.600000 1.572683
2016-01-02 05:00:00 12.075000 1.178629 1022.925000 1.537043
2016-01-02 06:00:00 11.975000 0.499166 1023.475000 1.596611
2016-01-02 07:00:00 12.125000 0.613052 1023.800000 1.388044
2016-01-02 08:00:00 11.900000 0.989949 1024.150000 0.932738
2016-01-02 09:00:00 11.875000 1.309898 1024.575000 0.573730
2016-01-02 10:00:00 11.575000 0.932291 1024.700000 0.163299
2016-01-02 11:00:00 12.225000 1.359841 1024.450000 0.238048
2016-01-02 12:00:00 12.400000 1.183216 1022.250000 2.079263
2016-01-02 13:00:00 NaN NaN NaN NaN
2016-01-02 14:00:00 NaN NaN NaN NaN
2016-01-02 15:00:00 NaN NaN NaN NaN
2016-01-02 16:00:00 NaN NaN NaN NaN
..........
计算仅适用于每天1:00至12:00的小时, 我猜是上午/下午可能有问题吗?
修改
print (df.index[:20])
给出:
DatetimeIndex(['2016-01-01 12:30:00', '2016-01-01 01:00:00',
'2016-01-01 01:30:00', '2016-01-01 02:00:00',
'2016-01-01 02:30:00', '2016-01-01 03:00:00',
'2016-01-01 03:30:00', '2016-01-01 04:00:00',
'2016-01-01 04:30:00', '2016-01-01 05:00:00',
'2016-01-01 05:30:00', '2016-01-01 06:00:00',
'2016-01-01 06:30:00', '2016-01-01 07:00:00',
'2016-01-01 07:30:00', '2016-01-01 08:00:00',
'2016-01-01 08:30:00', '2016-01-01 09:00:00',
'2016-01-01 09:30:00', '2016-01-01 10:00:00'],
dtype='datetime64[ns]', name='Time', freq=None)
我用以下内容解析了日期:
def parse_dt(dt, tm, ap):
return pd.to_datetime(dt + ' ' + tm)
df = pd.read_csv('d2016.txt', sep='\s+', skiprows=2, header=None,
parse_dates={'Time': [0,1,2] }, date_parser=parse_dt)
有什么想法吗?
感谢!!!