想要在数据框中应用自定义功能 例如。数据框
index City Age
0 1 A 50
1 2 A 24
2 3 B 65
3 4 A 40
4 5 B 68
5 6 B 48
应用功能
def count_people_above_60(age):
** *** #i dont know if the age can or can't be passed as series or list to perform any operation later
return count_people_above_60
希望做类似的事情
df.groupby(['City']).agg{"AGE" : ["mean",""count_people_above_60"]}
预期产量
City Mean People_Above_60
A 38 0
B 60.33 2
答案 0 :(得分:2)
如果性能很重要,请创建一个新列,并用转换为integer
s的比较值填充,以便用于计数sum
:
df = (df.assign(new = df['Age'].gt(60).astype(int))
.groupby(['City'])
.agg(Mean= ("Age" , "mean"), People_Above_60= ('new',"sum")))
print (df)
Mean People_Above_60
City
A 38.000000 0
B 60.333333 2
您的解决方案应使用比较值和sum
进行更改,但是如果有多个组或较大的DataFrame
,则解决方案的速度很慢:
def count_people_above_60(age):
return (age > 60).sum()
df = (df.groupby(['City']).agg(Mean=("Age" , "mean"),
People_Above_60=('Age',count_people_above_60)))
print (df)
Mean People_Above_60
City
A 38.000000 0
B 60.333333 2