仅部分DataFrame重新采样

时间:2017-07-22 12:40:37

标签: python pandas

我试图获得此DataFrame的每小时平均值(时间是指数):

                    Temp    Pressure
Time        
2016-01-01 12:30:00 13.8    1012.3
2016-01-01 01:00:00 13.6    1012.2
2016-01-01 01:30:00 14.5    1012.2
2016-01-01 02:00:00 15.2    1012.0
2016-01-01 02:30:00 14.4    1012.2
2016-01-01 03:00:00 15.1    1011.9
2016-01-01 03:30:00 14.9    1011.9
2016-01-01 04:00:00 15.2    1011.9
2016-01-01 04:30:00 14.9    1011.8
2016-01-01 05:00:00 14.1    1012.1
.......................
2016-04-11 10:30:00 20.3    1010.5
2016-04-11 11:00:00 20.3    1010.5
2016-04-11 11:30:00 20.2    1010.5

我的方法是使用重新采样:

df2 = df.resample('H').agg(['mean','std'])

但结果只是部分:

                            Temp                 Pressure
                       mean         std         mean    std
Time                
2016-01-01 01:00:00 13.150000   1.121011    1013.650000 1.674316
2016-01-01 02:00:00 13.200000   1.904381    1013.925000 2.112463
2016-01-01 03:00:00 13.625000   1.631717    1013.975000 2.404683
2016-01-01 04:00:00 13.700000   1.576917    1014.250000 2.786276
2016-01-01 05:00:00 12.925000   1.007886    1014.825000 2.869814
2016-01-01 06:00:00 12.425000   0.906918    1015.200000 2.965356
2016-01-01 07:00:00 12.475000   1.372042    1015.950000 3.074085
2016-01-01 08:00:00 11.950000   2.129945    1016.775000 3.221154
2016-01-01 09:00:00 11.875000   1.842779    1017.425000 3.105238
2016-01-01 10:00:00 12.025000   1.602862    1017.750000 2.950141
2016-01-01 11:00:00 11.475000   1.150000    1018.000000 3.119829
2016-01-01 12:00:00 13.066667   0.750555    1014.166667 1.619671
2016-01-01 13:00:00 NaN NaN NaN NaN
2016-01-01 14:00:00 NaN NaN NaN NaN
2016-01-01 15:00:00 NaN NaN NaN NaN
2016-01-01 16:00:00 NaN NaN NaN NaN
2016-01-01 17:00:00 NaN NaN NaN NaN
2016-01-01 18:00:00 NaN NaN NaN NaN
2016-01-01 19:00:00 NaN NaN NaN NaN
2016-01-01 20:00:00 NaN NaN NaN NaN
2016-01-01 21:00:00 NaN NaN NaN NaN
2016-01-01 22:00:00 NaN NaN NaN NaN
2016-01-01 23:00:00 NaN NaN NaN NaN
2016-01-02 00:00:00 NaN NaN NaN NaN
2016-01-02 01:00:00 12.325000   1.629673    1022.175000 1.820943
2016-01-02 02:00:00 12.350000   1.968925    1022.375000 1.588238
2016-01-02 03:00:00 12.250000   0.974679    1022.375000 1.819112
2016-01-02 04:00:00 12.025000   0.994569    1022.600000 1.572683
2016-01-02 05:00:00 12.075000   1.178629    1022.925000 1.537043
2016-01-02 06:00:00 11.975000   0.499166    1023.475000 1.596611
2016-01-02 07:00:00 12.125000   0.613052    1023.800000 1.388044
2016-01-02 08:00:00 11.900000   0.989949    1024.150000 0.932738
2016-01-02 09:00:00 11.875000   1.309898    1024.575000 0.573730
2016-01-02 10:00:00 11.575000   0.932291    1024.700000 0.163299
2016-01-02 11:00:00 12.225000   1.359841    1024.450000 0.238048
2016-01-02 12:00:00 12.400000   1.183216    1022.250000 2.079263
2016-01-02 13:00:00 NaN NaN NaN NaN
2016-01-02 14:00:00 NaN NaN NaN NaN
2016-01-02 15:00:00 NaN NaN NaN NaN
2016-01-02 16:00:00 NaN NaN NaN NaN
..........

计算仅适用于每天1:00至12:00的小时, 我猜是上午/下午可能有问题吗?

修改

print (df.index[:20])

给出:

DatetimeIndex(['2016-01-01 12:30:00', '2016-01-01 01:00:00',
               '2016-01-01 01:30:00', '2016-01-01 02:00:00',
               '2016-01-01 02:30:00', '2016-01-01 03:00:00',
               '2016-01-01 03:30:00', '2016-01-01 04:00:00',
               '2016-01-01 04:30:00', '2016-01-01 05:00:00',
               '2016-01-01 05:30:00', '2016-01-01 06:00:00',
               '2016-01-01 06:30:00', '2016-01-01 07:00:00',
               '2016-01-01 07:30:00', '2016-01-01 08:00:00',
               '2016-01-01 08:30:00', '2016-01-01 09:00:00',
               '2016-01-01 09:30:00', '2016-01-01 10:00:00'],
              dtype='datetime64[ns]', name='Time', freq=None)

我用以下内容解析了日期:

def parse_dt(dt, tm, ap):
    return pd.to_datetime(dt + ' ' + tm)

df = pd.read_csv('d2016.txt', sep='\s+', skiprows=2, header=None,
                 parse_dates={'Time': [0,1,2] }, date_parser=parse_dt)

有什么想法吗?

感谢!!!

0 个答案:

没有答案