Question

我想有一列引用groupby中变量X的均值，仅使用通过变量Y割除的行。

我已经做了一些变通办法，所以我想知道是否有更直接的方法来实现这一目标。

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Falcon', 'Parrot', 'Parrot','Parrot'],
                   'Max_Speed': [380., 370., 90, 24., 100., 101]})

df['meanWithCut'] = df.query('Max_Speed>98').groupby('Animal').transform('mean')

df['meanWithCut'] = df.groupby('Animal')['meanWithCut'].apply(lambda x: x.fillna(x.max()))

df

    Animal  Max_Speed   meanWithCut
0   Falcon  380.0       375.0
1   Falcon  370.0       375.0
2   Falcon  90.0        375.0
3   Parrot  24.0        100.5
4   Parrot  100.0       100.5
5   Parrot  101.0       100.5

Answer 1

如果您需要在分配之前进行过滤，我会先进行sparse df然后映射回

agg

推入一个

s = df.query('Max_Speed>98').groupby('Animal').agg('mean')
df['meanWithCut'] = s.reindex(df.Animal).values
df
Out[130]: 
   Animal  Max_Speed  meanWithCut
0  Falcon      380.0        375.0
1  Falcon      370.0        375.0
2  Falcon       90.0        375.0
3  Parrot       24.0        100.5
4  Parrot      100.0        100.5
5  Parrot      101.0        100.5

Answer 2

您可以map：

df['meanWithCut'] = (df['Animal'].map(df.query('Max_Speed>98')
                    .groupby('Animal')['Max_Speed'].mean()))
print(df)

   Animal  Max_Speed  meanWithCut
0  Falcon      380.0        375.0
1  Falcon      370.0        375.0
2  Falcon       90.0        375.0
3  Parrot       24.0        100.5
4  Parrot      100.0        100.5
5  Parrot      101.0        100.5

分组后的平均值，减去变量后

2 个答案: