有没有一种方法可以在熊猫聚合函数中创建自定义函数?

时间:2020-03-24 09:54:44

标签: python python-3.x pandas aggregate pandas-groupby

想要在数据框中应用自定义功能 例如。数据框

    index City  Age 
0   1    A    50    
1   2    A    24    
2   3    B    65    
3   4    A    40     
4   5    B    68    
5   6    B    48    

应用功能

def count_people_above_60(age):
     **    ***                       #i dont know if the age can or can't be passed as series or list to perform any operation later
     return count_people_above_60 

希望做类似的事情

df.groupby(['City']).agg{"AGE" : ["mean",""count_people_above_60"]}

预期产量

City  Mean People_Above_60
 A    38    0
 B    60.33    2

1 个答案:

答案 0 :(得分:2)

如果性能很重要,请创建一个新列,并用转换为integer s的比较值填充,以便用于计数sum

df = (df.assign(new = df['Age'].gt(60).astype(int))
        .groupby(['City'])
        .agg(Mean= ("Age" , "mean"), People_Above_60= ('new',"sum")))
print (df)
           Mean  People_Above_60
City                            
A     38.000000                0
B     60.333333                2

您的解决方案应使用比较值和sum进行更改,但是如果有多个组或较大的DataFrame,则解决方案的速度很慢:

def count_people_above_60(age):
    return (age > 60).sum()

df = (df.groupby(['City']).agg(Mean=("Age" , "mean"), 
                               People_Above_60=('Age',count_people_above_60)))
print (df)
           Mean  People_Above_60
City                            
A     38.000000                0
B     60.333333                2