嗨,我在运行带有statsmodels的Tweedie GLM时遇到内存错误。我看过Python statsmodels: memory error,但该帖子没有答案。
我正在此计算机上运行的计算机具有64gig的RAM和八个处理器。数据的形状为(722214, 47)
。
以下是我的代码:
formula = 'pp_log ~ C(atfault_model) + C(channel_model) + C(CLded_model) + C(credit_model_52778) + \
C(credit_model_c6) + C(package_model) + C(ds_fp_paid_in_full) + C(ds_pn_prior_insurance) + \
C(ds_ip_advanced_purchase) + C(credit_model_c5) + C(ds_ad_affinity) + C(ds_ak_alliance) + \
C(ds_ly_loyalty_discount) + C(ds_mo_multipolicy) + C(ds_pf_performance) + C(majorvio_model) + \
C(driver_age_model):C(marital_status_model) + C(minorvio_model) + C(multi_unit_model) + \
C(unit_drv_exp_model) + C(Vintiles) + C(safety_course_model) + C(instructor_course_model) + \
C(RATING_CLASS_CODE_MODEL) + C(class_model):C(v_age_model) + C(class_model):C(cc_model)'
y, x = patsy.dmatrices(formula, train, return_type = 'dataframe')
weights = train['coll_eu']
lost_cost_model = smf.GLM(y, x-1, family = sm.families.Tweedie(link = sm.families.links.log, var_power = 1.5), weights = weights)
lost_cost_results = lost_cost_model.fit()
其他信息:
在以下行抛出内存错误:
lost_cost_results = lost_cost_model.fit()
以下是追溯:
MemoryError跟踪(最近的调用) 最后) ----> 1个Lost_cost_results = lost_cost_model.fit()
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ genmod \ generalized_linear_model.py 适合(自我,start_params,maxiter,方法,tol,比例,cov_type, cov_kwds,use_t,full_output,disp,max_start_irls,** kwargs)1010 返回self._fit_irls(start_params = start_params,maxiter = maxiter,
1011 tol = tol,scale = scale, cov_type = cov_type, -> 1012 cov_kwds = cov_kwds,use_t = use_t,** kwargs)1013 else:1014
self._optim_hessian = kwargs.get('optim_hessian')C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ genmod \ generalized_linear_model.py 在_fit_irls中(自我,start_params,maxiter,tol,scale,cov_type, cov_kwds,use_t,** kwargs)1131 wlsendog,
1132 wlsexog, -> 1133 self.weights).fit(method = wls_method)1134 lin_pred = np.dot(self.exog,wls_results.params)
1135 lin_pred + = self._offset_exposureC:\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ regression_tools.py 在初始中(自我,内向性,外向性,体重) 47 self.wexog = w_half * exog 其他48个: ---> 49 self.wexog = w_half [:, None] * exog 50 51 def fit(self,method ='pinv'):
MemoryError:
添加2:
在保险业中,标准做法是将所有内容都归为分类变量。然后,一旦决定了如何使每个相对论平滑,就将需要更改的内容更改为数值类型,并拟合多项式或样条曲线等等。
一旦我使用了我可以作为数值的变量,它就可以运行了……仅3分钟。案件结案。