我有一个看起来像这样的数据集:
Date-Time Diff Load Load_number
10/22/2019 -386 0
10/23/2019 -380 0
10/24/2019 -370 0
10/25/2019 5000 Yes 1
10/26/2019 -490 1
10/27/2019 -480 1
10/28/2019 -470 1
10/22/2019 5000 Yes 2
10/23/2019 -380 2
10/24/2019 -370 2
10/25/2019 5000 Yes 3
10/26/2019 -490 3
10/27/2019 5800 Yes 4
10/28/2019 -550 4
10/29/2019 -500 4
有人可以帮助我找到以前加载次数的绝对平均值吗?
输出看起来像这样:
Date-Time Diff Load Load_number Average
10/22/2019 -386 0 0
10/23/2019 -380 0 0
10/24/2019 -370 0 0
10/25/2019 5000 Yes 1 378.66
10/26/2019 -490 1 378.66
10/27/2019 -480 1 378.66
10/28/2019 -470 1 378.66
10/22/2019 5000 Yes 2 480
10/23/2019 -380 2 480
10/24/2019 -370 2 480
10/25/2019 5000 Yes 3 375
10/26/2019 -490 3 375
10/27/2019 5800 Yes 4 490
10/28/2019 -550 4 490
10/29/2019 -500 4 490
在这里,当Load_number = 1
时,平均列的绝对差值为先前的Load_number
。在找到平均值时,我们不会考虑正值(例如Diff 5000)。
答案 0 :(得分:-1)
这应该有效
averages = df[df['Diff']<=0].groupby('Load_number')['Diff'].mean()
averages.loc[-1] = 0
df['Average'] = df.apply(lambda row: averages.loc[row.Load_number-1], axis=1)
答案 1 :(得分:-1)
load_averages = df.groupby('Load_number')['Diff'].agg(lambda s: s[s <= 0].mean()).shift(1).fillna(0).abs().to_dict()
df['Average'] = df['Load_number'].map(load_averages)
>>> df
Date-Time Diff Load_number Load Average
0 10/22/2019 -386.0 0.0 0.0
1 10/23/2019 -380.0 0.0 0.0
2 10/24/2019 -370.0 0.0 0.0
3 10/25/2019 5000.0 1.0 Yes 378.6666666666667
4 10/26/2019 -490.0 1.0 378.6666666666667
5 10/27/2019 -480.0 1.0 378.6666666666667
6 10/28/2019 -470.0 1.0 378.6666666666667
7 10/22/2019 5000.0 2.0 Yes 480.0
8 10/23/2019 -380.0 2.0 480.0
9 10/24/2019 -370.0 2.0 480.0
10 10/25/2019 5000.0 3.0 Yes 375.0
11 10/26/2019 -490.0 3.0 375.0
12 10/27/2019 5800.0 4.0 Yes 490.0
13 10/28/2019 -550.0 4.0 490.0
14 10/29/2019 -500.0 4.0 490.0