我使用以下代码生成随机日期和值:
import pandas as pd
import numpy as np
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
输出看起来像这样:
2000-01-01 00:00:00 40
2000-01-01 00:05:00 13
2000-01-01 00:10:00 99
2000-01-01 00:15:00 72
2000-01-01 00:20:00 4
2000-01-01 00:25:00 36
2000-01-01 00:30:00 24
2000-01-01 00:35:00 20
2000-01-01 00:40:00 83
2000-01-01 00:45:00 44
然后我将这些数据按小时指数值排序,然后按如下所示的平均值进行汇总:
0 50.380952
1 49.380952
2 49.904762
3 53.273810
4 47.178571
5 46.095238
6 49.047619
7 44.297619
8 53.119048
9 48.261905
10 45.166667
11 54.214286
12 50.714286
13 56.130952
14 50.916667
15 42.428571
16 46.880952
17 56.892857
18 54.071429
19 47.607143
20 50.940476
21 50.511905
22 44.550000
23 50.250000
但是,如果我想按索引小时值而不是平均值将所有数据分组,那么我应该怎么做?
谢谢。
此致
答案 0 :(得分:2)
如果要通过hour
s进行汇总,并采用以下方式:
np.random.seed(456)
time = pd.date_range('1/1/2000', periods=2000, freq='5min')
series = pd.Series(np.random.randint(100, size=2000), index=time)
s = series.groupby(series.index.hour).mean()
print (s)
0 49.392857
1 52.523810
2 53.047619
3 49.083333
4 49.785714
5 49.071429
6 52.476190
7 47.821429
8 52.190476
9 50.000000
10 49.035714
11 52.988095
12 52.785714
13 52.023810
14 46.964286
15 52.095238
16 51.047619
17 52.166667
18 48.357143
19 51.416667
20 45.214286
21 46.130952
22 49.750000
23 48.527778
dtype: float64
但是如果需要按小时MultiIndex
:
series.index = [series.index.hour, series.index]
print (series)
0 2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
1 2000-01-01 01:00:00 11
2000-01-01 01:05:00 77
2000-01-01 01:10:00 69
2000-01-01 01:15:00 98
...
然后可以按小时选择:
print (series.loc[0])
2000-01-01 00:00:00 27
2000-01-01 00:05:00 43
2000-01-01 00:10:00 89
2000-01-01 00:15:00 42
2000-01-01 00:20:00 28
2000-01-01 00:25:00 79
2000-01-01 00:30:00 60
2000-01-01 00:35:00 45
2000-01-01 00:40:00 37
2000-01-01 00:45:00 92
2000-01-01 00:50:00 39
2000-01-01 00:55:00 81
2000-01-02 00:00:00 82
2000-01-02 00:05:00 69
2000-01-02 00:10:00 99
2000-01-02 00:15:00 17
2000-01-02 00:20:00 59
...
如果还需要mean
,而没有任何变化DatetimeIndex
:
s1 = series.groupby(series.index.hour).transform('mean')
print (s1)
2000-01-01 00:00:00 49.392857
2000-01-01 00:05:00 49.392857
2000-01-01 00:10:00 49.392857
2000-01-01 00:15:00 49.392857
2000-01-01 00:20:00 49.392857
2000-01-01 00:25:00 49.392857
2000-01-01 00:30:00 49.392857
2000-01-01 00:35:00 49.392857
2000-01-01 00:40:00 49.392857
2000-01-01 00:45:00 49.392857
2000-01-01 00:50:00 49.392857
2000-01-01 00:55:00 49.392857
2000-01-01 01:00:00 52.523810
2000-01-01 01:05:00 52.523810
2000-01-01 01:10:00 52.523810
2000-01-01 01:15:00 52.523810
2000-01-01 01:20:00 52.523810
2000-01-01 01:25:00 52.523810
2000-01-01 01:30:00 52.523810
...
编辑:
每小时使用列表:
s = series.groupby(series.index.hour).apply(list)
print (s)
0 [27, 43, 89, 42, 28, 79, 60, 45, 37, 92, 39, 8...
1 [11, 77, 69, 98, 78, 84, 34, 66, 4, 8, 85, 62,...
2 [16, 41, 10, 72, 44, 35, 48, 51, 99, 53, 22, 3...
3 [56, 22, 74, 85, 81, 6, 44, 44, 49, 43, 95, 11...
4 [21, 90, 89, 76, 62, 20, 66, 50, 68, 79, 69, 4...
5 [51, 85, 31, 58, 97, 10, 91, 25, 4, 11, 94, 28...
6 [5, 71, 62, 57, 62, 87, 12, 41, 43, 47, 25, 15...
7 [84, 17, 26, 32, 14, 76, 72, 35, 8, 60, 79, 27...
8 [15, 30, 80, 53, 10, 97, 71, 83, 37, 44, 89, 1...
9 [58, 20, 98, 77, 75, 26, 63, 26, 24, 62, 93, 6...
10 [39, 61, 92, 43, 61, 73, 86, 64, 26, 0, 75, 11...
11 [24, 13, 13, 54, 50, 38, 22, 46, 67, 15, 29, 4...
12 [21, 56, 16, 63, 46, 79, 11, 85, 87, 18, 66, 9...
13 [10, 89, 66, 80, 60, 2, 6, 19, 77, 81, 38, 48,...
14 [17, 64, 90, 91, 71, 32, 77, 9, 76, 14, 9, 79,...
15 [95, 75, 49, 34, 5, 31, 43, 68, 84, 48, 25, 69...
16 [13, 68, 87, 96, 6, 83, 9, 5, 29, 93, 57, 92, ...
17 [77, 6, 73, 41, 76, 93, 11, 50, 72, 84, 82, 53...
18 [95, 11, 61, 56, 30, 24, 24, 9, 0, 65, 96, 82,...
19 [31, 14, 98, 67, 7, 54, 29, 60, 77, 83, 45, 70...
20 [4, 15, 37, 78, 79, 59, 63, 97, 14, 74, 33, 2,...
21 [88, 69, 31, 20, 41, 10, 41, 6, 36, 27, 63, 49...
22 [4, 90, 70, 66, 92, 46, 54, 47, 6, 54, 62, 80,...
23 [27, 23, 21, 18, 29, 39, 77, 88, 21, 86, 7, 45...
dtype: object