我有7列数据,按日期时间(频率为30分钟)从2017-05-31开始,于2018-05-25结束。我想绘制特定日期范围(季节)的平均值。我一直在尝试groupby,但无法按特定范围进行分组。如果执行df.groupby(df.date.dt.month).mean()
,则会得到错误的结果。
数据集中的几行(日期范围为2017-05-31至2018-05-25)
50 51 56 58
date
2017-05-31 00:00:00 200.213542 276.929198 242.879051 NaN
2017-05-31 00:30:00 200.215478 276.928229 242.879051 NaN
2017-05-31 01:00:00 200.215478 276.925324 242.878083 NaN
2017-06-01 01:00:00 200.221288 276.944691 242.827729 NaN
2017-06-01 01:30:00 200.221288 276.944691 242.827729 NaN
2017-08-31 09:00:00 206.961886 283.374453 245.041349 184.358250
2017-08-31 09:30:00 206.966727 283.377358 245.042317 184.360187
2017-12-31 09:00:00 212.925877 287.198416 247.455413 187.175144
2017-12-31 09:30:00 212.926846 287.196480 247.465097 187.179987
2018-03-31 23:00:00 213.304498 286.933093 246.469647 186.887548
2018-03-31 23:30:00 213.308369 286.938902 246.468678 186.891422
2018-04-30 23:00:00 215.496812 288.342024 247.522230 188.104749
2018-04-30 23:30:00 215.497781 288.340086 247.520294 188.103780
我已经创建了这些变量(这些是我需要的范围)
increment_rates_winter = df['2017-08-30'].mean() - df['2017-06-01'].mean()
increment_rates_spring = df['2017-11-30'].mean() - df['2017-09-01'].mean()
increment_rates_summer = df['2018-02-28'].mean() - df['2017-12-01'].mean()
increment_rates_fall = df['2018-05-24'].mean() - df['2018-03-01'].mean()
连接它们:
df_seasons =pd.concat([increment_rates_winter,increment_rates_spring,increment_rates_summer,increment_rates_fall],axis=1)
绘制后,我得到了:
但是,我一直在尝试获取它:
df_seasons
Out[664]:
Winter Spring Summer Fall
50 6.697123 6.948447 -1.961549 7.662622
51 6.428329 4.760650 -2.188402 5.927087
52 5.580953 6.667529 1.136889 12.939295
53 6.406259 2.506279 -2.105125 6.964549
54 4.332826 3.678492 -2.574769 6.569398
56 2.222032 3.359607 -2.694863 5.348258
58 NaN 1.388535 -0.035889 4.213046
x的季节和每列的均值。
Winter = df['2017-06-01':'2017-08-30']
Spring = df['2017-09-01':'2017-11-30']
Summer = df['2017-12-01':'2018-02-28']
Fall = df['2018-03-01':'2018-05-30']
提前谢谢!
答案 0 :(得分:2)
我们可以通过以下方式获取特定的日期范围,然后您可以根据需要进行定义并取平均值
import pandas as pd
df = pd.read_csv('test.csv')
df['date'] = pd.to_datetime(df['date'])
start_date = "2017-12-31 09:00:00"
end_date = "2018-04-30 23:00:00"
mask = (df['date'] > start_date) & (df['date'] <= end_date)
f_df = df.loc[mask]
这给出了输出
date 50 ... 58
8 2017-12-31 09:30:00 212.926846 ... 187.179987 NaN
9 2018-03-31 23:00:00 213.304498 ... 186.887548 NaN
10 2018-03-31 23:30:00 213.308369 ... 186.891422 NaN
11 2018-04-30 23:00:00 215.496812 ... 188.104749 NaN
希望这会有所帮助
答案 1 :(得分:1)