我想从每日数据框计算每个月的平均值。
ds y
1256 2000-01-03 1.8050
1257 2000-01-04 1.8405
1258 2000-01-05 1.8560
1259 2000-01-06 1.8400
1260 2000-01-07 1.8310
1261 2000-01-10 1.8190
1262 2000-01-11 1.8225
1263 2000-01-12 1.8350
... ... ...
5844 2018-04-09 3.3950
5845 2018-04-10 3.4146
5846 2018-04-11 3.3955
5847 2018-04-12 3.3902
5848 2018-04-13 3.4088
5849 2018-04-16 3.4282
5850 2018-04-17 3.4022
5851 2018-04-18 3.3844
5852 2018-04-19 3.4028
5853 2018-04-20 3.4121
5854 2018-04-23 3.4463
5855 2018-04-24 3.4685
5856 2018-04-25 3.5090
5857 2018-04-26 3.4992
我尝试过使用它:
results.groupby(results['ds'].dt.strftime('%B'))['y'].sum().sort_values()
但结果是所有年份的汇总价值,而不是每年:
ds
November 873.4324
February 889.8996
September 898.4053
July 900.0330
June 918.0984
January 937.3191
October 947.2213
December 949.5291
May 949.8178
August 959.7570
April 969.8364
March 1026.8202
Name: y, dtype: float64
以下功能似乎可行,但我无法正常使用:
DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None
答案 0 :(得分:3)
你几乎就在那里,但你也需要按年分组。
print(df)
ds y
1256 2000-01-03 1.8050
1257 2000-01-04 1.8405
1258 2000-01-05 1.8560
1259 2000-01-06 1.8400
1260 2000-01-07 1.8310
1261 2000-01-10 1.8190
1262 2000-01-11 1.8225
1263 2000-01-12 1.8350
5844 2018-04-09 3.3950
5845 2018-04-10 3.4146
5846 2018-04-11 3.3955
5847 2018-04-12 3.3902
5848 2018-04-13 3.4088
5849 2018-04-16 3.4282
5850 2018-04-17 3.4022
5851 2018-04-18 3.3844
5852 2018-04-19 3.4028
5853 2018-04-20 3.4121
5854 2018-04-23 3.4463
5855 2018-04-24 3.4685
5856 2018-04-25 3.5090
5857 2018-04-26 3.4992
df['ds'] = pd.to_datetime(df['ds'])
df.groupby([df['ds'].dt.strftime('%Y'),df['ds'].dt.strftime('%B')]).mean()
输出:
y
ds ds
2000 January 1.831125
2018 April 3.425486
答案 1 :(得分:2)
IIUC,您可以使用pd.Grouper
。我冒昧地向你的数据框添加了几行(用不同的月份)来显示:
>>> df
ds y
1256 2000-01-03 1.8050
1257 2000-01-04 1.8405
1258 2000-01-05 1.8560
1259 2000-01-06 1.8400
1260 2000-01-07 1.8310
1261 2000-01-10 1.8190
1262 2000-01-11 1.8225
1263 2000-01-12 1.8350
1263 2000-02-12 1.8350
1263 2000-02-15 2.9450
5844 2018-04-09 3.3950
5845 2018-04-10 3.4146
5846 2018-04-11 3.3955
5847 2018-04-12 3.3902
5848 2018-04-13 3.4088
5849 2018-04-16 3.4282
5850 2018-04-17 3.4022
5851 2018-04-18 3.3844
5852 2018-04-19 3.4028
5853 2018-04-20 3.4121
5854 2018-04-23 3.4463
5855 2018-04-24 3.4685
5856 2018-04-25 3.5090
5857 2018-04-26 3.4992
# first cast ds to datetime
df['ds'] = pd.to_datetime(df['ds'])
# then group by month, and get the mean:
df.groupby(pd.Grouper(key='ds', freq='M')).mean().dropna()
y
ds
2000-01-31 1.831125
2000-02-29 2.390000
2018-04-30 3.425486
结果系列显示每月y
的平均值,显示该月最后一天的日期。
答案 2 :(得分:2)
您可以将多个项目作为列表传递给groupby
。在这种情况下,您希望按年和月分组,因此您可以执行以下操作:
import pandas as pd
results['ds'] = pd.to_datetime(results.ds)
gp = results.groupby([results.ds.dt.year, results['ds'].dt.strftime('%B')]).y.mean()
gp.index.names=['year', 'month']
#year month
#2000 January 1.831125
#2018 April 3.425486
#Name: y, dtype: float64