我有一个月的数据序列,我想要整个月00:00, 00:30, 01:00, ...23:30
的平均值。如果这是每小时一次我可以简单地做
df.groupby(df.index.hour).mean()
但我不知道如何在30分钟内完成此操作。我尝试过随机的事情,比如
df.groupby(df.index.hour*df.index.minute).mean()
但他们都没有奏效。这可以在熊猫中完成吗?
谢谢。
修改的 样本数据:
2015-06-01 00:00:00 4.474450 137.007017
2015-06-01 00:30:00 5.661688 138.342549
2015-06-01 01:00:00 6.142984 139.469381
2015-06-01 01:30:00 6.245277 140.780341
2015-06-01 02:00:00 6.368909 141.464176
2015-06-01 02:30:00 6.535648 143.121590
... ... ...
2015-06-04 21:30:00 6.380301 123.523559
2015-06-04 22:00:00 6.118872 124.649216
2015-06-04 22:30:00 6.554864 127.671638
2015-06-04 23:00:00 7.628708 129.960442
2015-06-04 23:30:00 8.082754 132.294248
2015-06-04 00:00:00 7.768733 132.960135
请注意,数据跨度超过一天,但我希望将结果作为24*2
长度数组。由于00:30
数据将是该时间内所有日期的数据的平均值,等等。
答案 0 :(得分:2)
IIUC我想你想要以下内容:
In [13]:
# load the data
t="""2015-06-01 00:00:00,4.474450,137.007017
2015-06-01 00:30:00,5.661688,138.342549
2015-06-01 01:00:00,6.142984,139.469381
2015-06-01 01:30:00,6.245277,140.780341
2015-06-01 02:00:00,6.368909,141.464176
2015-06-01 02:30:00,6.535648,143.121590
2015-06-04 21:30:00,6.380301,123.523559
2015-06-04 22:00:00,6.118872,124.649216
2015-06-04 22:30:00,6.554864,127.671638
2015-06-04 23:00:00,7.628708,129.960442
2015-06-04 23:30:00,8.082754,132.294248
2015-06-04 00:00:00,7.768733,132.960135"""
df = pd.read_csv(io.StringIO(t), index_col=[0], parse_dates=[0], header=None)
df.columns = ['x','y']
In [14]:
# group on the hour and minute attribute of the index
df.groupby([df.index.hour, df.index.minute]).mean()
Out[14]:
x y
0 0 6.121592 134.983576
30 5.661688 138.342549
1 0 6.142984 139.469381
30 6.245277 140.780341
2 0 6.368909 141.464176
30 6.535648 143.121590
21 30 6.380301 123.523559
22 0 6.118872 124.649216
30 6.554864 127.671638
23 0 7.628708 129.960442
30 8.082754 132.294248
因此,上面将对索引的索引小时和分钟属性进行分组,因此这为您提供值为00:30
,01:00
等所有日期的平均值。
答案 1 :(得分:0)
假设你想要每30分钟的平均值。在一个月内平均一段时间,尝试使用resample(如果您的数据间隔不是均匀间隔30分钟),然后使用groupby。
df = pd.DataFrame(np.random.randn(5000),
columns=['vals'],
index=pd.date_range(start=dt.datetime.now(),
periods=5000,
freq='30T'))
df = df.resample('30Min')
>>> df.groupby(lambda x: (x.year, x.month, x.hour, x.minute)).vals.mean()
(2015, 6, 0, 0) -0.120642
(2015, 6, 0, 30) 0.172788
(2015, 6, 1, 0) 0.310861
(2015, 6, 1, 30) -0.054615
(2015, 6, 2, 0) -0.122372
(2015, 6, 2, 30) -0.160935
(2015, 6, 3, 0) 0.290064
(2015, 6, 3, 30) 0.040233
(2015, 6, 4, 0) -0.267994
(2015, 6, 4, 30) 0.032256
(2015, 6, 5, 0) -0.240584
(2015, 6, 5, 30) -0.095077
(2015, 6, 6, 0) -0.145298
(2015, 6, 6, 30) 0.311680
(2015, 6, 7, 0) -0.259130
...
(2015, 9, 16, 30) -0.249618
(2015, 9, 17, 0) 0.000566
(2015, 9, 17, 30) 0.085121
(2015, 9, 18, 0) -0.008067
(2015, 9, 18, 30) -0.392995
(2015, 9, 19, 0) -0.509947
(2015, 9, 19, 30) 0.117550
(2015, 9, 20, 0) 0.076988
(2015, 9, 20, 30) -0.096187
(2015, 9, 21, 0) -0.066262
(2015, 9, 21, 30) -0.274175
(2015, 9, 22, 0) -0.459320
(2015, 9, 22, 30) 0.685940
(2015, 9, 23, 0) -0.050148
(2015, 9, 23, 30) 0.038874
Name: a, Length: 192, dtype: float64
并查看月平均值:
df2 = df.reset_index()
df2['year'] = [c[0] for c in df2['index']]
df2['month'] = [c[1] for c in df2['index']]
df2['hour'] = [c[2] for c in df2['index']]
df2['minutes'] = [c[3] for c in df2['index']]
>>> df2.pivot_table(values='vals',
columns=['year', 'month'],
index=['hour', 'minutes'],
aggfunc=np.mean)
year 2015
month 6 7 8 9
hour minutes
0 0 -0.120642 0.260686 0.320550 0.374241
30 0.172788 -0.078378 0.092206 0.151341
1 0 0.310861 -0.210523 -0.005879 -0.162668
30 -0.054615 0.069194 -0.026174 0.218007
2 0 -0.122372 0.036491 0.266133 0.050847
30 -0.160935 0.191182 0.205710 0.183733
3 0 0.290064 0.062863 0.042098 -0.167724
30 0.040233 -0.083346 0.248039 0.654488
4 0 -0.267994 -0.304616 -0.227858 -0.306729
30 0.032256 0.036278 -0.350544 0.111284
5 0 -0.240584 0.177614 0.174180 -0.156598
30 -0.095077 0.350684 0.430140 0.050188
6 0 -0.145298 0.260356 0.314880 -0.367434
30 0.311680 -0.307146 -0.024851 -0.012917
7 0 -0.259130 -0.030620 0.027398 -0.050143
30 0.283149 -0.465681 0.067154 -0.118537
8 0 0.108188 -0.034551 0.206411 -0.325447
30 -0.069086 -0.074594 -0.081681 0.087789
9 0 -0.115867 0.257696 -0.056953 -0.000636
30 -0.194631 -0.018209 0.097634 0.321195
10 0 -0.029710 -0.179173 -0.362098 -0.425820
30 0.171463 -0.275286 0.124837 0.185941
11 0 0.027725 0.116209 0.397818 0.273722
30 0.045747 0.113604 0.053537 0.130483
12 0 0.397945 0.106375 0.316335 0.487824
30 -0.133603 0.352268 0.043338 -0.080617
13 0 -0.152457 0.005833 -0.024060 -0.484102
30 0.023435 -0.243851 -0.190029 -0.155168
14 0 -0.029532 0.020272 0.299358 -0.158454
30 0.250930 -0.157656 0.007717 -0.088050
15 0 -0.098546 0.282827 -0.185139 -0.119801
30 -0.145674 -0.047190 -0.078103 -0.116217
16 0 0.164972 0.190326 0.156651 -0.559833
30 -0.034718 -0.273184 -0.254462 -0.249618
17 0 0.133240 0.071170 -0.200580 0.000566
30 -0.030369 0.007821 -0.298061 0.085121
18 0 0.184950 0.013328 0.196898 -0.008067
30 0.049239 -0.050993 0.008094 -0.392995
19 0 -0.067991 -0.011393 -0.101014 -0.509947
30 -0.219792 0.098113 -0.297009 0.117550
20 0 0.174875 -0.253166 -0.130623 0.076988
30 -0.407662 -0.221100 0.172923 -0.096187
21 0 0.041020 -0.381691 -0.090805 -0.066262
30 -0.163835 -0.158566 -0.466063 -0.274175
22 0 -0.039960 0.400497 0.028426 -0.459320
30 0.023610 -0.097154 -0.010363 0.685940
23 0 -0.261549 0.010280 0.019144 -0.050148
30 -0.008354 -0.451011 -0.012453 0.038874
答案 2 :(得分:-1)
Dataframe有一个内置的方法,这个叫做resample,只做...
df.resample('30Min',how='mean')
实际上'如何'默认为平均值,因此您可以将其缩短为
df.resample('30Min')