有没有一种方法可以计算前一列条件的平均值?

时间:2019-11-22 21:36:28

标签: python pandas pandas-groupby mean

我有一个看起来像这样的数据集:

Date-Time      Diff        Load    Load_number
10/22/2019     -386                  0                
10/23/2019     -380                  0                 
10/24/2019     -370                  0                
10/25/2019     5000        Yes       1              
10/26/2019     -490                  1             
10/27/2019     -480                  1             
10/28/2019     -470                  1             
10/22/2019     5000        Yes       2              
10/23/2019     -380                  2              
10/24/2019     -370                  2              
10/25/2019     5000        Yes       3              
10/26/2019     -490                  3              
10/27/2019     5800        Yes       4                 
10/28/2019     -550                  4                 
10/29/2019     -500                  4           

有人可以帮助我找到以前加载次数的绝对平均值吗?

输出看起来像这样:

Date-Time      Diff        Load    Load_number     Average
10/22/2019     -386                  0                0
10/23/2019     -380                  0                0    
10/24/2019     -370                  0                0
10/25/2019     5000        Yes       1              378.66
10/26/2019     -490                  1              378.66
10/27/2019     -480                  1              378.66
10/28/2019     -470                  1              378.66
10/22/2019     5000        Yes       2               480
10/23/2019     -380                  2               480
10/24/2019     -370                  2               480
10/25/2019     5000        Yes       3               375
10/26/2019     -490                  3               375
10/27/2019     5800        Yes       4               490   
10/28/2019     -550                  4               490   
10/29/2019     -500                  4               490

在这里,当Load_number = 1时,平均列的绝对差值为先前的Load_number。在找到平均值时,我们不会考虑正值(例如Diff 5000)。

2 个答案:

答案 0 :(得分:-1)

这应该有效

averages = df[df['Diff']<=0].groupby('Load_number')['Diff'].mean()
averages.loc[-1] = 0
df['Average'] = df.apply(lambda row: averages.loc[row.Load_number-1], axis=1)

答案 1 :(得分:-1)

load_averages = df.groupby('Load_number')['Diff'].agg(lambda s: s[s <= 0].mean()).shift(1).fillna(0).abs().to_dict()
df['Average'] = df['Load_number'].map(load_averages)

>>> df

    Date-Time   Diff    Load_number Load    Average
0   10/22/2019  -386.0      0.0             0.0
1   10/23/2019  -380.0      0.0             0.0
2   10/24/2019  -370.0      0.0             0.0
3   10/25/2019  5000.0      1.0     Yes     378.6666666666667
4   10/26/2019  -490.0      1.0             378.6666666666667
5   10/27/2019  -480.0      1.0             378.6666666666667
6   10/28/2019  -470.0      1.0             378.6666666666667
7   10/22/2019  5000.0      2.0     Yes     480.0
8   10/23/2019  -380.0      2.0             480.0
9   10/24/2019  -370.0      2.0             480.0
10  10/25/2019  5000.0      3.0     Yes     375.0
11  10/26/2019  -490.0      3.0             375.0
12  10/27/2019  5800.0      4.0     Yes     490.0
13  10/28/2019  -550.0      4.0             490.0
14  10/29/2019  -500.0      4.0             490.0