在Pandas中分组30分钟的空间

时间:2015-06-01 17:49:35

标签: python pandas

我有一个月的数据序列,我想要整个月00:00, 00:30, 01:00, ...23:30的平均值。如果这是每小时一次我可以简单地做

df.groupby(df.index.hour).mean()

但我不知道如何在30分钟内完成此操作。我尝试过随机的事情,比如

df.groupby(df.index.hour*df.index.minute).mean()

但他们都没有奏效。这可以在熊猫中完成吗?

谢谢。

修改的 样本数据:

2015-06-01 00:00:00     4.474450  137.007017
2015-06-01 00:30:00     5.661688  138.342549
2015-06-01 01:00:00     6.142984  139.469381
2015-06-01 01:30:00     6.245277  140.780341
2015-06-01 02:00:00     6.368909  141.464176
2015-06-01 02:30:00     6.535648  143.121590
...                          ...         ...
2015-06-04 21:30:00     6.380301  123.523559
2015-06-04 22:00:00     6.118872  124.649216
2015-06-04 22:30:00     6.554864  127.671638
2015-06-04 23:00:00     7.628708  129.960442
2015-06-04 23:30:00     8.082754  132.294248
2015-06-04 00:00:00     7.768733  132.960135

请注意,数据跨度超过一天,但我希望将结果作为24*2长度数组。由于00:30数据将是该时间内所有日期的数据的平均值,等等。

3 个答案:

答案 0 :(得分:2)

IIUC我想你想要以下内容:

In [13]:
# load the data
t="""2015-06-01 00:00:00,4.474450,137.007017
2015-06-01 00:30:00,5.661688,138.342549
2015-06-01 01:00:00,6.142984,139.469381
2015-06-01 01:30:00,6.245277,140.780341
2015-06-01 02:00:00,6.368909,141.464176
2015-06-01 02:30:00,6.535648,143.121590
2015-06-04 21:30:00,6.380301,123.523559
2015-06-04 22:00:00,6.118872,124.649216
2015-06-04 22:30:00,6.554864,127.671638
2015-06-04 23:00:00,7.628708,129.960442
2015-06-04 23:30:00,8.082754,132.294248
2015-06-04 00:00:00,7.768733,132.960135"""
df = pd.read_csv(io.StringIO(t), index_col=[0], parse_dates=[0], header=None)
df.columns = ['x','y']

In [14]:
# group on the hour and minute attribute of the index
df.groupby([df.index.hour, df.index.minute]).mean()
Out[14]:
              x           y
0  0   6.121592  134.983576
   30  5.661688  138.342549
1  0   6.142984  139.469381
   30  6.245277  140.780341
2  0   6.368909  141.464176
   30  6.535648  143.121590
21 30  6.380301  123.523559
22 0   6.118872  124.649216
   30  6.554864  127.671638
23 0   7.628708  129.960442
   30  8.082754  132.294248

因此,上面将对索引的索引小时和分钟属性进行分组,因此这为您提供值为00:3001:00等所有日期的平均值。

答案 1 :(得分:0)

假设你想要每30分钟的平均值。在一个月内平均一段时间,尝试使用resample(如果您的数据间隔不是均匀间隔30分钟),然后使用groupby。

df = pd.DataFrame(np.random.randn(5000), 
                  columns=['vals'],
                  index=pd.date_range(start=dt.datetime.now(), 
                  periods=5000,
                  freq='30T'))

df = df.resample('30Min')
>>> df.groupby(lambda x: (x.year, x.month, x.hour, x.minute)).vals.mean()
(2015, 6, 0, 0)    -0.120642
(2015, 6, 0, 30)    0.172788
(2015, 6, 1, 0)     0.310861
(2015, 6, 1, 30)   -0.054615
(2015, 6, 2, 0)    -0.122372
(2015, 6, 2, 30)   -0.160935
(2015, 6, 3, 0)     0.290064
(2015, 6, 3, 30)    0.040233
(2015, 6, 4, 0)    -0.267994
(2015, 6, 4, 30)    0.032256
(2015, 6, 5, 0)    -0.240584
(2015, 6, 5, 30)   -0.095077
(2015, 6, 6, 0)    -0.145298
(2015, 6, 6, 30)    0.311680
(2015, 6, 7, 0)    -0.259130
...
(2015, 9, 16, 30)   -0.249618
(2015, 9, 17, 0)     0.000566
(2015, 9, 17, 30)    0.085121
(2015, 9, 18, 0)    -0.008067
(2015, 9, 18, 30)   -0.392995
(2015, 9, 19, 0)    -0.509947
(2015, 9, 19, 30)    0.117550
(2015, 9, 20, 0)     0.076988
(2015, 9, 20, 30)   -0.096187
(2015, 9, 21, 0)    -0.066262
(2015, 9, 21, 30)   -0.274175
(2015, 9, 22, 0)    -0.459320
(2015, 9, 22, 30)    0.685940
(2015, 9, 23, 0)    -0.050148
(2015, 9, 23, 30)    0.038874
Name: a, Length: 192, dtype: float64

并查看月平均值:

df2 = df.reset_index()
df2['year'] = [c[0] for c in df2['index']]
df2['month'] = [c[1] for c in df2['index']]
df2['hour'] = [c[2] for c in df2['index']]
df2['minutes'] = [c[3] for c in df2['index']]

>>> df2.pivot_table(values='vals', 
                    columns=['year', 'month'], 
                    index=['hour', 'minutes'], 
                    aggfunc=np.mean)

year              2015                              
month                6         7         8         9
hour minutes                                        
0    0       -0.120642  0.260686  0.320550  0.374241
     30       0.172788 -0.078378  0.092206  0.151341
1    0        0.310861 -0.210523 -0.005879 -0.162668
     30      -0.054615  0.069194 -0.026174  0.218007
2    0       -0.122372  0.036491  0.266133  0.050847
     30      -0.160935  0.191182  0.205710  0.183733
3    0        0.290064  0.062863  0.042098 -0.167724
     30       0.040233 -0.083346  0.248039  0.654488
4    0       -0.267994 -0.304616 -0.227858 -0.306729
     30       0.032256  0.036278 -0.350544  0.111284
5    0       -0.240584  0.177614  0.174180 -0.156598
     30      -0.095077  0.350684  0.430140  0.050188
6    0       -0.145298  0.260356  0.314880 -0.367434
     30       0.311680 -0.307146 -0.024851 -0.012917
7    0       -0.259130 -0.030620  0.027398 -0.050143
     30       0.283149 -0.465681  0.067154 -0.118537
8    0        0.108188 -0.034551  0.206411 -0.325447
     30      -0.069086 -0.074594 -0.081681  0.087789
9    0       -0.115867  0.257696 -0.056953 -0.000636
     30      -0.194631 -0.018209  0.097634  0.321195
10   0       -0.029710 -0.179173 -0.362098 -0.425820
     30       0.171463 -0.275286  0.124837  0.185941
11   0        0.027725  0.116209  0.397818  0.273722
     30       0.045747  0.113604  0.053537  0.130483
12   0        0.397945  0.106375  0.316335  0.487824
     30      -0.133603  0.352268  0.043338 -0.080617
13   0       -0.152457  0.005833 -0.024060 -0.484102
     30       0.023435 -0.243851 -0.190029 -0.155168
14   0       -0.029532  0.020272  0.299358 -0.158454
     30       0.250930 -0.157656  0.007717 -0.088050
15   0       -0.098546  0.282827 -0.185139 -0.119801
     30      -0.145674 -0.047190 -0.078103 -0.116217
16   0        0.164972  0.190326  0.156651 -0.559833
     30      -0.034718 -0.273184 -0.254462 -0.249618
17   0        0.133240  0.071170 -0.200580  0.000566
     30      -0.030369  0.007821 -0.298061  0.085121
18   0        0.184950  0.013328  0.196898 -0.008067
     30       0.049239 -0.050993  0.008094 -0.392995
19   0       -0.067991 -0.011393 -0.101014 -0.509947
     30      -0.219792  0.098113 -0.297009  0.117550
20   0        0.174875 -0.253166 -0.130623  0.076988
     30      -0.407662 -0.221100  0.172923 -0.096187
21   0        0.041020 -0.381691 -0.090805 -0.066262
     30      -0.163835 -0.158566 -0.466063 -0.274175
22   0       -0.039960  0.400497  0.028426 -0.459320
     30       0.023610 -0.097154 -0.010363  0.685940
23   0       -0.261549  0.010280  0.019144 -0.050148
     30      -0.008354 -0.451011 -0.012453  0.038874

答案 2 :(得分:-1)

Dataframe有一个内置的方法,这个叫做resample,只做...

df.resample('30Min',how='mean')

实际上'如何'默认为平均值,因此您可以将其缩短为

df.resample('30Min')