根据熊猫数据框中的其他列获取变量的平均值

时间:2021-07-05 02:25:25

标签: python pandas

基本上标题是什么,我有一个 csv,我将它转换成一个 Pandas 数据框,它是这样的:

"ID","Name","Sex","Age","Height","Weight","Team","NOC","Games","Year","Season","City","Sport","Event","Medal"
"1","A Dijiang","M",24,180,80,"China","CHN","1992 Summer",1992,"Summer","Barcelona","Basketball","Basketball Men's Basketball",NA
"2","A Lamusi","M",23,170,60,"China","CHN","2012 Summer",2012,"Summer","London","Judo","Judo Men's Extra-Lightweight",NA
"3","Gunnar Nielsen Aaby","M",24,NA,NA,"Denmark","DEN","1920 Summer",1920,"Summer","Antwerpen","Football","Football Men's Football",NA
"4","Edgar Lindenau Aabye","M",34,NA,NA,"Denmark/Sweden","DEN","1900 Summer",1900,"Summer","Paris","Tug-Of-War","Tug-Of-War Men's Tug-Of-War","Gold"
"5","Christine Jacoba Aaftink","F",21,185,82,"Netherlands","NED","1988 Winter",1988,"Winter","Calgary","Speed Skating","Speed Skating Women's 500 metres",NA

我想根据年份获得“高度”的平均值,例如具有 1992 年的平均值等

2 个答案:

答案 0 :(得分:2)

这就是你想要的!!

df.groupby('Year')['Height'].mean()

它是如何工作的?最初,我们按特定列分组(也可以按一组列分组)。然后,我们需要告诉列我们需要分组,在我们的例子中是“高度”,然后是标准,这里是“平均值”。

另一种语法如下:

import numpy as np #Since there's no mean function in python
df.groupby('Year')['Height'].agg(np.mean)

答案 1 :(得分:0)

试试这个:

df = pd.read_csv("filename.csv")
mean_height = df[df["Year"]==1992]["height"].mean()