熊猫groupby和滚动4周窗口

时间:2018-09-20 20:58:05

标签: python pandas

我有一系列半小时间隔的数据。对于数据集中的每个半小时间隔,我需要将周二,周三和周四的滚动平均值(整周)取为4。因此,第一个“窗口”在1-4周的时间为00:00:00、00:30:00,...,23:00:00、23:30:00。然后,下一个窗口将具有2-5周等的平均值。

我有以下数据集,该数据集包含每日数据,但仅包括星期二,星期三和星期四(出于任何原因,不使用其他日子来计算平均值)。此外,在那几天,我有半小时间隔的数据(但仅包括样本中00:00:00、00:30:00、01:00:00和01:30:00的半小时时间间隔)

datetime    timeblock   speed
1/3/2017 0:00   0:00:00 81.186885
1/3/2017 0:30   0:30:00 NaN
1/3/2017 1:00   1:00:00 85.277724
1/3/2017 1:30   1:30:00 85.077176
1/4/2017 0:00   0:00:00 80.691608
1/4/2017 0:30   0:30:00 79.223225
1/4/2017 1:00   1:00:00 82.330169
1/4/2017 1:30   1:30:00 79.495578
1/5/2017 0:00   0:00:00 74.162426
1/5/2017 0:30   0:30:00 75.206492
1/5/2017 1:00   1:00:00 77.6484
1/5/2017 1:30   1:30:00 72.61875
1/10/2017 0:00  0:00:00 77.785555
1/10/2017 0:30  0:30:00 80.617395
1/10/2017 1:00  1:00:00 80.094947
1/10/2017 1:30  1:30:00 77.697473
1/11/2017 0:00  0:00:00 74.7104
1/11/2017 0:30  0:30:00 75.691326
1/11/2017 1:00  1:00:00 74.639803
1/11/2017 1:30  1:30:00 81.797268
1/12/2017 0:00  0:00:00 79.571042
1/12/2017 0:30  0:30:00 78.083612
1/12/2017 1:00  1:00:00 78.747287
1/12/2017 1:30  1:30:00 78.128129
1/17/2017 0:00  0:00:00 76.509323
1/17/2017 0:30  0:30:00 77.256
1/17/2017 1:00  1:00:00 78.627085
1/17/2017 1:30  1:30:00 81.588
1/18/2017 0:00  0:00:00 77.82543
1/18/2017 0:30  0:30:00 80.231272
1/18/2017 1:00  1:00:00 NaN
1/18/2017 1:30  1:30:00 74.656384
1/19/2017 0:00  0:00:00 77.37165
1/19/2017 0:30  0:30:00 80.328705
1/19/2017 1:00  1:00:00 80.011531
1/19/2017 1:30  1:30:00 79.643781
1/24/2017 0:00  0:00:00 81.167016
1/24/2017 0:30  0:30:00 NaN
1/24/2017 1:00  1:00:00 83.128695
1/24/2017 1:30  1:30:00 77.799428
1/25/2017 0:00  0:00:00 73.106437
1/25/2017 0:30  0:30:00 71.316
1/25/2017 1:00  1:00:00 75.966
1/25/2017 1:30  1:30:00 74.345225
1/26/2017 0:00  0:00:00 78.768
1/26/2017 0:30  0:30:00 80.436508
1/26/2017 1:00  1:00:00 76.782222
1/26/2017 1:30  1:30:00 76.168687
1/31/2017 0:00  0:00:00 73.780363
1/31/2017 0:30  0:30:00 72.32356
1/31/2017 1:00  1:00:00 74.119404
1/31/2017 1:30  1:30:00 72.412363
2/1/2017 0:00   0:00:00 75.572408
2/1/2017 0:30   0:30:00 72.486593
2/1/2017 1:00   1:00:00 77.357
2/1/2017 1:30   1:30:00 74.134188
2/2/2017 0:00   0:00:00 72.209382
2/2/2017 0:30   0:30:00 75.792807
2/2/2017 1:00   1:00:00 74.167605
2/2/2017 1:30   1:30:00 78.053373

我尝试了以下代码,但没有得到预期的结果:

roll_mean = sample.groupby('timeblock')['speed'].rolling('30D', min_value = '30D').mean()

所需的结果应为以下内容:

Window      00:00:00    00:30:00    01:00:00    01:30:00
1 (wks 1-4) 77.74       NaN         NaN         78.25
2 (wks 2-5) 76.53       NaN         NaN         77.20

提前谢谢

编辑:语法/说明

In[1]: sample.index
Out[1]: 
DatetimeIndex(['2017-01-03 00:00:00', '2017-01-03 00:30:00',
               '2017-01-03 01:00:00', '2017-01-03 01:30:00',
               '2017-01-03 02:00:00', '2017-01-03 02:30:00',
               '2017-01-03 03:00:00', '2017-01-03 03:30:00',
               '2017-01-03 04:00:00', '2017-01-03 04:30:00',
               ...
               '2017-12-28 19:00:00', '2017-12-28 19:30:00',
               '2017-12-28 20:00:00', '2017-12-28 20:30:00',
               '2017-12-28 21:00:00', '2017-12-28 21:30:00',
               '2017-12-28 22:00:00', '2017-12-28 22:30:00',
               '2017-12-28 23:00:00', '2017-12-28 23:30:00'],
              dtype='datetime64[ns]', name='datetime', length=7488, freq=None)
In[2]: sample.dtypes
Out[3]: 
timeblock     object
speed        float64
dtype: object

1 个答案:

答案 0 :(得分:0)

因此,我能够获得所需的结果。

toll = pd.pivot_table(toll, columns='timeblock',index='date', values='speed')
toll = toll.resample('W').mean().rolling(4).mean()