我想有一列引用groupby中变量X的均值,仅使用通过变量Y割除的行。
我已经做了一些变通办法,所以我想知道是否有更直接的方法来实现这一目标。
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Falcon', 'Parrot', 'Parrot','Parrot'],
'Max_Speed': [380., 370., 90, 24., 100., 101]})
df['meanWithCut'] = df.query('Max_Speed>98').groupby('Animal').transform('mean')
df['meanWithCut'] = df.groupby('Animal')['meanWithCut'].apply(lambda x: x.fillna(x.max()))
df
Animal Max_Speed meanWithCut
0 Falcon 380.0 375.0
1 Falcon 370.0 375.0
2 Falcon 90.0 375.0
3 Parrot 24.0 100.5
4 Parrot 100.0 100.5
5 Parrot 101.0 100.5
答案 0 :(得分:2)
如果您需要在分配之前进行过滤,我会先进行sparse df
然后映射回
agg
推入一个
s = df.query('Max_Speed>98').groupby('Animal').agg('mean')
df['meanWithCut'] = s.reindex(df.Animal).values
df
Out[130]:
Animal Max_Speed meanWithCut
0 Falcon 380.0 375.0
1 Falcon 370.0 375.0
2 Falcon 90.0 375.0
3 Parrot 24.0 100.5
4 Parrot 100.0 100.5
5 Parrot 101.0 100.5
答案 1 :(得分:2)
您可以map
:
df['meanWithCut'] = (df['Animal'].map(df.query('Max_Speed>98')
.groupby('Animal')['Max_Speed'].mean()))
print(df)
Animal Max_Speed meanWithCut
0 Falcon 380.0 375.0
1 Falcon 370.0 375.0
2 Falcon 90.0 375.0
3 Parrot 24.0 100.5
4 Parrot 100.0 100.5
5 Parrot 101.0 100.5