我有一个如下所示的DataFrame:
In [55]:df
Out [55]:
WEIGHT TR_YTD TR_3Y_Ann Categ1 Categ2 Number
0 0.131214 -0.28 8.49 1 a 0
1 0.052092 1.69 13.70 3 b 1
2 0.045993 2.50 NaN 1 a 2
3 0.041450 -4.57 5.07 1 c 3
4 0.040769 7.64 17.49 2 a 4
5 0.039791 0.07 0.21 1 a 5
6 0.039271 -6.14 8.88 3 a 6
7 0.038340 -8.13 NaN 1 c 7
8 0.038227 9.26 13.78 2 a 8
9 0.033878 0.02 11.45 1 a 9
10 0.029455 5.91 24.86 3 b 10
我有权重,表现和类别(实际上我有更多的列,但这是一个最小的工作示例)。我想:
代码:
def get_wavg(df, field):
return np.average(df[field], weights=df['WEIGHT'])
groups = [df['Categ1'], df['Categ2'], [df['Categ1'],df['Categ2']]]
funcdict = {'Number':'count',
'WEIGHT': 'sum', \
'TR_YTD': lambda x: get_wavg(x,'TR_YTD'), \
'TR_3Y_Ann': lambda x: get_wavg(x,'TR_3Y_Ann')}
for group in groups:
# preparing list to sort dynamically
# sorting by the first layers of groups (excluding last) and then weights
groupnames = [x.name for x in group]
sortinglist = groupnames[:-1]
sortinglist.append('WEIGHTS')
ascendinglist = [True]*(len(groupnames)-1)+[False]
# apply agg functions
grouped = df.groupby(group)
grouped = grouped.agg(funcdict)
# sorting
grouped = pd.DataFrame(grouped)
grouped.reset_index(inplace=True)
grouped.sort_values(by=sortinglist,ascending=ascendinglist,inplace=True)
grouped.set_index(groupnames,inplace=True)
print (grouped)
问题是我无法找到正确的语法使其适用于我的用户定义函数。如果我使用np.nanmean
代替它可行,但不会给我我想要的结果。
这样做的正确方法是什么?