计算大熊猫在不同日期的平均值

时间:2020-06-26 17:42:29

标签: python pandas dataframe

我有一个看起来像这样的数据框。

Date    Daily Risk Score
0   2020-06-26  6.0
1   2020-06-27  6.0
2   2020-06-28  6.0
3   2020-06-29  6.0
4   2020-06-30  6.0
5   2020-07-01  6.0
6   2020-07-02  6.0
7   2020-07-03  6.0
8   2020-07-04  6.0
9   2020-07-05  6.0
10  2020-07-06  6.0
11  2020-07-07  6.0
12  2020-07-08  6.0
13  2020-07-09  6.0
14  2020-06-26  6.0
15  2020-06-27  6.0
16  2020-06-28  6.0
17  2020-06-29  6.0
18  2020-06-30  6.0
19  2020-07-01  6.0
20  2020-07-02  6.0
21  2020-07-03  6.0
22  2020-07-04  6.0
23  2020-07-05  6.0
24  2020-07-06  6.0
25  2020-07-07  6.0
26  2020-07-08  6.0
27  2020-07-09  6.0
28  2020-06-26  1.0
29  2020-06-27  1.0

实际数据帧大约为5万个条目。然后,我想取每个日期的所有每日风险评分的平均值。然后,我想将这14个新平均值中的每一个存储在称为“均值”的新列中,其中有14个值对应于它们的计算日期。

我试图这样做:

df2['Date']= pd.to_datetime(df2['Date']) 
dates=pd.date_range(today, (today+dt.timedelta()))
for i in dates:
    df2=df2[df2['Date']==i]
    df2['means']=df2['Daily Risk Score'].mean()

但这仅计算第一天的平均值,然后停止循环。我在做什么错了?

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

mean_df = df.groupby("Date").mean().reset_index()
mean_df.columns = ["Date", "ScoreMean"]
#          Date     means
#0   2020-06-26  4.333333
#1   2020-06-27  4.333333
#2   2020-06-28  6.000000
#3   2020-06-29  6.000000
#4   2020-06-30  6.000000
#5   2020-07-01  6.000000
#6   2020-07-02  6.000000
#7   2020-07-03  6.000000
#8   2020-07-04  6.000000
#9   2020-07-05  6.000000
#10  2020-07-06  6.000000
#11  2020-07-07  6.000000
#12  2020-07-08  6.000000
#13  2020-07-09  6.000000

result = pd.merge(df, mean_df, on="Date")
#          Date  DailyRiskScore     means
#0   2020-06-26             6.0  4.333333
#1   2020-06-26             6.0  4.333333
#2   2020-06-26             1.0  4.333333
#3   2020-06-27             6.0  4.333333
#4   2020-06-27             6.0  4.333333
#5   2020-06-27             1.0  4.333333
#6   2020-06-28             6.0  6.000000
#7   2020-06-28             6.0  6.000000
#8   2020-06-29             6.0  6.000000
#9   2020-06-29             6.0  6.000000
#10  2020-06-30             6.0  6.000000
#11  2020-06-30             6.0  6.000000
#12  2020-07-01             6.0  6.000000
#13  2020-07-01             6.0  6.000000
#14  2020-07-02             6.0  6.000000
#15  2020-07-02             6.0  6.000000
#16  2020-07-03             6.0  6.000000
#17  2020-07-03             6.0  6.000000
#18  2020-07-04             6.0  6.000000
#19  2020-07-04             6.0  6.000000
#20  2020-07-05             6.0  6.000000
#21  2020-07-05             6.0  6.000000
#22  2020-07-06             6.0  6.000000
#23  2020-07-06             6.0  6.000000
#24  2020-07-07             6.0  6.000000
#25  2020-07-07             6.0  6.000000
#26  2020-07-08             6.0  6.000000
#27  2020-07-08             6.0  6.000000
#28  2020-07-09             6.0  6.000000