如何输出多列的平均值?
Gender Age Salary Yr_exp cup_coffee_daily
Male 28 45000.0 6.0 2.0
Female 40 70000.0 15.0 10.0
Female 23 40000.0 1.0 0.0
Male 35 55000.0 12.0 6.0
我有df.groupby('Gender', as_index=False)['Age', 'Salary', 'Yr_exp'].mean()
,但它仍然只返回第一列Age
的平均值。如何返回不同列中特定列的平均值?期望的输出:
Gender Age Salary Yr_exp
Male 31.5 50000.0 9.0
Female 31.5 55000.0 8.0
感谢。
答案 0 :(得分:6)
鉴于此数据框:
df = pd.DataFrame({
"Gender": ["Male", "Female", "Female", "Male"],
"Age": [28, 40, 23, 35],
"Salary": [45000, 70000, 40000, 55000],
"Yr_exp": [6, 15, 1, 12]
})
df
Age Gender Salary Yr_exp
0 28 Male 45000 6
1 40 Female 70000 15
2 23 Female 40000 1
3 35 Male 55000 12
按性别分组并使用mean()
功能:
df.groupby("Gender").mean()
Age Salary Yr_exp
Gender
Female 31.5 55000.0 8.0
Male 31.5 50000.0 9.0
修改:您可能需要更改groupby()
之后的索引编制方式:df['Age', 'Salary']
提供KeyError
,但df[['Age', 'Salary']]
会返回预期的结果:
Age Salary
0 28 45000
1 40 70000
2 23 40000
3 35 55000
尝试更改
df.groupby("Gender", as_index=True)['Age', 'Salary', 'Yr_exp'].mean()
到
df.groupby("Gender", as_index=True)[['Age', 'Salary', 'Yr_exp']].mean()
答案 1 :(得分:0)
您也可以使用pandas.agg()
:
df.groupby("Gender").agg({'Age' : 'mean', 'Salary' : 'mean', 'Yr_exp': 'mean'})
结果将:
Age Salary Yr_exp
Gender
Female 31.5 55000 8
Male 31.5 50000 9
使用.agg()
使您有机会将不同的功能应用于分组的对象-类似于:
df.groupby("Gender").agg({'Age' : 'mean', 'Salary' : ['min', 'max'], 'Yr_exp': 'sum'})
输出:
Age Salary Yr_exp
mean min max sum
Gender
Female 31.5 40000 70000 16
Male 31.5 45000 55000 18