我有一个熊猫数据框-
data = {'year':[1990, 1990, 1990,
1990, 1990, 1990,
1990, 1990, 1990],
'zip':['22204', '22204', '22204',
'20194', '20194', '20194',
'24060', '24060', '24060'],
'education':[0, 0, 1,
1, 0, 1,
0, 1, 0]}
df = pd.DataFrame(data = data)
我想使用groupby函数计算教育变量education
中每个结果的百分比-
df = df.groupby(['zip', 'year'])['education'].value_counts(normalize = True, dropna = False).unstack().fillna(0)
但是,我想在自定义聚合函数中调用代码行。当我运行下面的代码行时,出现此错误消息-AttributeError: 'Float64Index' object has no attribute 'remove_unused_levels'
。
def percent_by_category(group):
return group.value_counts(normalize = True, dropna = False).unstack().fillna(0)
df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
是否可以创建一个自定义聚合函数来计算groupby组中每个结果的百分比?理想情况下,我想调用其他几个内置和自定义聚合函数。例如-
df = df.groupby(['zip', 'year']).agg({'education':percent_by_category,
'education':sum,
'education':another_custom_function,
another_variable:another_custom_function})
答案 0 :(得分:1)
否,如果必须使用agg
函数是聚合函数的标量输出。
如果测试.value_counts()
的工作方式,则有Series,所以不可能unstack
。
def percent_by_category(group):
print (group.value_counts(normalize = True, dropna = False))
df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
print (df)
1 0.666667
0 0.333333
Name: education, dtype: float64
0 0.666667
1 0.333333
Name: education, dtype: float64
0 0.666667
1 0.333333
Name: education, dtype: float64
education
zip year
20194 1990 None
22204 1990 None
24060 1990 None
因此,如果要返回非标量输出,则会引发错误:
def percent_by_category(group):
return group.value_counts(normalize = True, dropna = False)
df = df.groupby(['zip', 'year']).agg({'education':percent_by_category})
print (df)
ValueError:函数不会减少