python的新手。因此,请原谅错误。我正在编写一个脚本,使用groupby.agg对熊猫数据框进行分组。尝试调用以lambda函数的输出为输入的函数时出现错误
这是合并数据帧的示例
cprdf.iloc[5:10,5:20]
Out[237]:
Loan Nbr Servicer Loan Nbr Recon Action Code Loan Count_x \
5 21522594 25701889 Y 0.00 1
6 21522594 25701889 Y 0.00 1
7 21522594 25701889 Y 0.00 1
8 21522594 25701889 Y 0.00 1
9 21522594 25701889 Y 0.00 1
Days Delinquent_x Sale Date_x UPB Beginning UPB Purchase UPB Sch Prin \
5 0.00 NaN 142,936.57 0.00 162.16
6 0.00 NaN 143,097.92 0.00 161.35
7 0.00 NaN 143,258.47 0.00 160.55
8 0.00 NaN 143,418.22 0.00 159.75
9 0.00 NaN 143,735.33 0.00 317.11
UPB Curtailment UPB Liq UPB Adjustment UPB Non Cash UPB Ending
5 0.00 0.00 0.00 0.00 142,774.41
6 0.00 0.00 0.00 0.00 142,936.57
7 0.00 0.00 0.00 0.00 143,097.92
8 0.00 0.00 0.00 0.00 143,258.47
9 0.00 0.00 0.00 0.00 143,418.22
我想做的是为各种groupby操作实现以下公式
SMM =(UPB限制+ UPB Liq + UPBAdj)/(UPB开始)
心肺复苏术(%)= 100 *(1-(1-SMM)^ 12
这是相关代码
cprdf['NonSchP'] = cprdf['UPB Curtailment'] + cprdf['UPB Liq'] + \
cprdf['UPB Adjustment']
cprdf['SMM'] = np.where(cprdf['UPB Beginning'] == 0, 0,
cprdf['NonSchP']/cprdf['UPB Beginning'])
def wtavg(x):
return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])
def cpr(y):
z = 100 * (1 - np.power((1 - y), 12))
return z
# dictionary for new columns
n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
'SMMAgg' : pd.NamedAgg('SMM', wtavg(cprdf['SMM'])),
'Rate': pd.NamedAgg('Current Loan Rate',wtavg(cprdf['Current Loan Rate'])),
'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}
cprgroup = cprdf.groupby(['month_year'],as_index=True).agg(**n)
cprgroup.reset_index(drop=False,inplace=True)
我希望输出是
cprgroup
出[240]:
month_year UPB_sum UPB_count PIF_sum PIF_count SMM Rate \
0 2019-04 11,237,040.94 22 718,172.19 1.00 0.06 5.95
1 2019-05 16,684,325.75 31 0.00 0.00 0.00 5.99
2 2019-06 106,783,721.43 221 2,242,731.83 3.00 0.02 5.77
3 2019-07 104,181,644.18 218 1,035,861.72 3.00 0.01 5.77
4 2019-08 102,853,211.42 215 3,188,568.04 2.00 0.03 5.77
CPR
0 54.75
1 0.03
2 24.07
3 13.24
4 31.70
相反,当我运行程序时,出现以下错误
runfile('C:/Users/spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
Traceback (most recent call last):
File "<ipython-input-241-c3f795a9d003>", line 1, in <module>
runfile('C:/.spyder-py3/untitled3.py', wdir='C:/Users/.spyder-py3')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/.spyder-py3/untitled3.py", line 51, in <module>
'CPR':pd.NamedAgg('SMM',cpr(wtavg(cprdf['SMM'])))}
File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))
TypeError: unsupported operand type(s) for -: 'int' and 'function'
我是否错误地将lambda功能作为cpr函数的输入?
当我将字典'n'更改为使用'SMMAgg'作为函数的输入
'CPR':pd.NamedAgg('SMMAgg',cpr(SMMAgg))
我明白了
NameError: name 'SMMAgg' is not defined
当我将公式更改为
'CPR':pd.NamedAgg('SMMAgg',cpr('SMMAgg'))
我明白了
File "C:/Users/.spyder-py3/untitled3.py", line 39, in cpr
z = 100 * (1 - np.power((1 - y), 12))
TypeError: unsupported operand type(s) for -: 'int' and 'str'
任何帮助都会得到帮助。
我通过将聚合后的CPR函数作为新列添加到分组的数据帧中来规避错误,并能够获得所需的输出。但是在字典中调用此函数有些不明白的地方。
谢谢。
答案 0 :(得分:1)
经过研究,我找到了解决方案。我注意到的一个问题(不是100%确定)是NamegAgg不接受用于聚合的多个自定义函数的同一列。因此,我创建了一个虚拟SMM列。我修改了CPR函数,方法是返回lambda,而不是将其分配给新变量并返回。我还调用了CPR函数内部的wtavg函数,并将变量数组称为输入。所以
cprdf['SMM1']=cprdf['SMM']
def wtavg():
return lambda x: np.average(x, weights=cprdf.loc[x.index, 'UPB Beginning'])
def cpr():
return lambda y: 100 * (1 - np.power((1 - wtavg()(y)), 12))
然后我的kwarg字典看起来像这样
n = {'UPB_sum' : pd.NamedAgg('UPB Beginning', 'sum'),
'UPB_count': pd.NamedAgg('UPB Beginning', 'count'),
'PIF_sum': pd.NamedAgg('UPB Liq', 'sum'),
'PIF_count' : pd.NamedAgg('UPB Liq', np.count_nonzero),
'SMMAgg' : pd.NamedAgg('SMM', wtavg()),
'Rate': pd.NamedAgg('Current Loan Rate',wtavg()),
'CPRAgg':pd.NamedAgg('SMM1',cpr())}
cprgroup=cprdf.groupby(['month_year'],as_index=True).agg(**n)
输出
cprgroup
Out[51]:
month_year UPB_sum UPB_count PIF_sum PIF_count SMMAgg \
0 2019-04 1.123704e+07 22 718172.19 1.0 0.063944
1 2019-05 1.668433e+07 31 0.00 0.0 0.000025
2 2019-06 1.067837e+08 221 2242731.83 3.0 0.022690
3 2019-07 1.041816e+08 218 1035861.72 3.0 0.011770
4 2019-08 1.028532e+08 215 3188568.04 2.0 0.031268
Rate CPRAgg
0 5.946053 54.749920
1 5.987882 0.030278
2 5.774863 24.074820
3 5.772602 13.244130
4 5.771342 31.696039
瞧!