按一天中的小时分组熊猫数据

时间:2018-11-29 06:17:04

标签: python pandas pandas-groupby

我使用以下代码生成随机日期和值:

import pandas as pd
import numpy as np

time = pd.date_range('1/1/2000', periods=2000, freq='5min')

series = pd.Series(np.random.randint(100, size=2000), index=time)

输出看起来像这样:

2000-01-01 00:00:00    40
2000-01-01 00:05:00    13
2000-01-01 00:10:00    99
2000-01-01 00:15:00    72
2000-01-01 00:20:00     4
2000-01-01 00:25:00    36
2000-01-01 00:30:00    24
2000-01-01 00:35:00    20
2000-01-01 00:40:00    83
2000-01-01 00:45:00    44

然后我将这些数据按小时指数值排序,然后按如下所示的平均值进行汇总:

0     50.380952
1     49.380952
2     49.904762
3     53.273810
4     47.178571
5     46.095238
6     49.047619
7     44.297619
8     53.119048
9     48.261905
10    45.166667
11    54.214286
12    50.714286
13    56.130952
14    50.916667
15    42.428571
16    46.880952
17    56.892857
18    54.071429
19    47.607143
20    50.940476
21    50.511905
22    44.550000
23    50.250000

但是,如果我想按索引小时值而不是平均值将所有数据分组,那么我应该怎么做?

谢谢。

此致

1 个答案:

答案 0 :(得分:2)

如果要通过hour s进行汇总,并采用以下方式:

np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)

s = series.groupby(series.index.hour).mean()
print (s)
0     49.392857
1     52.523810
2     53.047619
3     49.083333
4     49.785714
5     49.071429
6     52.476190
7     47.821429
8     52.190476
9     50.000000
10    49.035714
11    52.988095
12    52.785714
13    52.023810
14    46.964286
15    52.095238
16    51.047619
17    52.166667
18    48.357143
19    51.416667
20    45.214286
21    46.130952
22    49.750000
23    48.527778
dtype: float64

但是如果需要按小时MultiIndex

series.index = [series.index.hour, series.index]
print (series)
0   2000-01-01 00:00:00    27
    2000-01-01 00:05:00    43
    2000-01-01 00:10:00    89
    2000-01-01 00:15:00    42
    2000-01-01 00:20:00    28
    2000-01-01 00:25:00    79
    2000-01-01 00:30:00    60
    2000-01-01 00:35:00    45
    2000-01-01 00:40:00    37
    2000-01-01 00:45:00    92
    2000-01-01 00:50:00    39
    2000-01-01 00:55:00    81
1   2000-01-01 01:00:00    11
    2000-01-01 01:05:00    77
    2000-01-01 01:10:00    69
    2000-01-01 01:15:00    98

...

然后可以按小时选择:

print (series.loc[0])
2000-01-01 00:00:00    27
2000-01-01 00:05:00    43
2000-01-01 00:10:00    89
2000-01-01 00:15:00    42
2000-01-01 00:20:00    28
2000-01-01 00:25:00    79
2000-01-01 00:30:00    60
2000-01-01 00:35:00    45
2000-01-01 00:40:00    37
2000-01-01 00:45:00    92
2000-01-01 00:50:00    39
2000-01-01 00:55:00    81
2000-01-02 00:00:00    82
2000-01-02 00:05:00    69
2000-01-02 00:10:00    99
2000-01-02 00:15:00    17
2000-01-02 00:20:00    59
...

如果还需要mean,而没有任何变化DatetimeIndex

s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00    49.392857
2000-01-01 00:05:00    49.392857
2000-01-01 00:10:00    49.392857
2000-01-01 00:15:00    49.392857
2000-01-01 00:20:00    49.392857
2000-01-01 00:25:00    49.392857
2000-01-01 00:30:00    49.392857
2000-01-01 00:35:00    49.392857
2000-01-01 00:40:00    49.392857
2000-01-01 00:45:00    49.392857
2000-01-01 00:50:00    49.392857
2000-01-01 00:55:00    49.392857
2000-01-01 01:00:00    52.523810
2000-01-01 01:05:00    52.523810
2000-01-01 01:10:00    52.523810
2000-01-01 01:15:00    52.523810
2000-01-01 01:20:00    52.523810
2000-01-01 01:25:00    52.523810
2000-01-01 01:30:00    52.523810
...

编辑:

每小时使用列表:

s = series.groupby(series.index.hour).apply(list)
print (s)
0     [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1     [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2     [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3     [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4     [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5     [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6     [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7     [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8     [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9     [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10    [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11    [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12    [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13    [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14    [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15    [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16    [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17    [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18    [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19    [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20    [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21    [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22    [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23    [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object