我有一张GLM系数表和其他值。我想编写一个将系数重新设定为我选择的单位值的函数。因此,如果自付额的参考/统一值为0,而我希望为500,则该函数将通过该变量的每个系数除以500处的任何系数。
这是一些示例代码:
df3 = {'variable': ['intercept', 'CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model'
,'married_age','married_age','married_age', 'class_cc', 'class_cc', 'class_cc', 'class_cc', 'class_v_age'
,'class_v_age','class_v_age', 'class_v_age'],
'level': [None,0,100,200,250,500,750,1000, 60, 61, 62, 100, 1200, 1500, 100
,10, 20, 15, 10],
'value': [None, 460955.7793,955735.0532,586308.4028,12216916.67,48401773.87,1477842.472,14587994.92,10493740.36,36388470.44
,31805316.37, 123.4, 4546.50, 439854.23, 2134.4, 2304.5, 2032.30, 159.80, 22],
'coefficient': [-2.36E-14, 0.00174356, 0.00174356, 0.00174356, 0.00174356, 0.00174356 ,0.00174356 , 0.00174356
,-1.004648e-02, -1.004648e-02,-1.071730e-02,-1.812330e-04,-1.812330e-04,8.727980e-04,1.402564e-03
,-1.681685e-01, -8.442040e-02, -1.812330e-04, -1.465950e-01]}
results = pd.DataFrame(df3)
results['factor'] = results['level']*results['coefficient']
results
def rebase(df, variable1, unity_value):
"""
rebase the factors according to where the modeler wants the unity to be
"""
df['factor_rebased'] = ""
base_factor = df[(df['variable'] == variable1) & (df['level'] == unity_value)]['coefficient']
if df['variable'].any() == variable1:
df['factor_rebased'] = df['coefficient']/base_factor
return df['factor_rebased']
rebase(results, 'CLded_model', 500)
结果是一个空序列。我想要的结果是一个名为factor_rebased
的新列,在该列中,我可以为每个唯一变量在循环上重新运行此函数,而无需保存重新设置的值。
理想的第一轮结果是这样的:运行rebase(结果,“ CLded_model”,500)会将CLded_model
变量的每个因子除以2.391(500因子):>
df3 = {'variable': ['intercept', 'CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model'
,'married_age','married_age','married_age', 'class_cc', 'class_cc', 'class_cc', 'class_cc', 'class_v_age'
,'class_v_age','class_v_age', 'class_v_age'],
'level': [None,0,100,200,250,500,750,1000, 60, 61, 62, 100, 1200, 1500, 100
,10, 20, 15, 10],
'value': [None, 460955.7793,955735.0532,586308.4028,12216916.67,48401773.87,1477842.472,14587994.92,10493740.36,36388470.44
,31805316.37, 123.4, 4546.50, 439854.23, 2134.4, 2304.5, 2032.30, 159.80, 22],
'coefficient': [-2.36E-14, 0.00174356, 0.00174356, 0.00174356, 0.00174356, 0.00174356 ,0.00174356 , 0.00174356
,-1.004648e-02, -1.004648e-02,-1.071730e-02,-1.812330e-04,-1.812330e-04,8.727980e-04,1.402564e-03
,-1.681685e-01, -8.442040e-02, -1.812330e-04, -1.465950e-01],
'factor': [ None, 1. , 1.1904793 , 1.41724097, 1.54633869,
2.39116334, 3.69754838, 5.71766211, 0.54728324, 0.5418125 ,
0.51454483, 0.98203994, 0.80454402, 3.70319885, 1.15056877,
0.1860602 , 0.18481351, 0.9972852 , 0.23085857],
'factor_rebased':[None, .418, .592, .647, 1, 1.660, 2.391, None,None,None,None,None,None,None,None,None,None,None,None]}
results = pd.DataFrame(df3)
results
运行此轮的第二轮(即循环)如下所示:已婚年龄因素均除以.5418-已婚年龄60岁的因素:
df3 = {'variable': ['intercept', 'CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model','CLded_model'
,'married_age','married_age','married_age', 'class_cc', 'class_cc', 'class_cc', 'class_cc', 'class_v_age'
,'class_v_age','class_v_age', 'class_v_age'],
'level': [None,0,100,200,250,500,750,1000, 60, 61, 62, 100, 1200, 1500, 100
,10, 20, 15, 10],
'value': [None, 460955.7793,955735.0532,586308.4028,12216916.67,48401773.87,1477842.472,14587994.92,10493740.36,36388470.44
,31805316.37, 123.4, 4546.50, 439854.23, 2134.4, 2304.5, 2032.30, 159.80, 22],
'coefficient': [-2.36E-14, 0.00174356, 0.00174356, 0.00174356, 0.00174356, 0.00174356 ,0.00174356 , 0.00174356
,-1.004648e-02, -1.004648e-02,-1.071730e-02,-1.812330e-04,-1.812330e-04,8.727980e-04,1.402564e-03
,-1.681685e-01, -8.442040e-02, -1.812330e-04, -1.465950e-01],
'factor': [ None, 1. , 1.1904793 , 1.41724097, 1.54633869,
2.39116334, 3.69754838, 5.71766211, 0.54728324, 0.5418125 ,
0.51454483, 0.98203994, 0.80454402, 3.70319885, 1.15056877,
0.1860602 , 0.18481351, 0.9972852 , 0.23085857],
'factor_rebased':[None, .418, .592, .647, 1, 1.660, 2.391, 1.01,1,.99,None,None,None,None,None,None,None,None,None]}
results = pd.DataFrame(df3)
#results['factor'] = np.exp(results['level']*results['coefficient'])
results
所以我不确定为什么要得到一个空的系列。感谢社区提供的任何帮助。
答案 0 :(得分:1)
您可以/应该更改几件事。查看评论以获取详细信息:
def rebase(df, variable1, unity_value):
"""
rebase the factors according to where the modeler wants the unity to be
"""
# you would erase all your previous run by this line
# df['factor_rebased'] = ""
# instead do, where 0 makes more sense than "", but it's up to you
if ('factor_rebased' not in df.columns): df['factor_rebased'] = 0
# I'm not sure what do you want by this
# what if there is no df['level'] == unity_value
base_factor = df[(df['variable'] == variable1) & (df['level'] == unity_value)]['coefficient'].values
# if df['variable'].any() == variable1:
# ...
# I believe what you mean is
filters = df['variable'].eq(variable1)
if filters.any():
df.loc[filters, 'factor_rebased'] = df.loc[filters, 'coefficient']/base_factor
# why return? You already update df['factor_rebased']
# return df['factor_rebased']