Question

嗨，我在运行带有statsmodels的Tweedie GLM时遇到内存错误。我看过Python statsmodels: memory error，但该帖子没有答案。

我正在此计算机上运行的计算机具有64gig的RAM和八个处理器。数据的形状为(722214, 47)。

以下是我的代码：

formula = 'pp_log ~ C(atfault_model) + C(channel_model) + C(CLded_model) + C(credit_model_52778) + \
        C(credit_model_c6) + C(package_model) + C(ds_fp_paid_in_full) + C(ds_pn_prior_insurance) + \
        C(ds_ip_advanced_purchase) + C(credit_model_c5) + C(ds_ad_affinity) + C(ds_ak_alliance) + \
        C(ds_ly_loyalty_discount) + C(ds_mo_multipolicy) + C(ds_pf_performance) + C(majorvio_model) + \
        C(driver_age_model):C(marital_status_model) + C(minorvio_model) + C(multi_unit_model) + \
        C(unit_drv_exp_model) +  C(Vintiles) + C(safety_course_model) + C(instructor_course_model) + \
        C(RATING_CLASS_CODE_MODEL) + C(class_model):C(v_age_model) + C(class_model):C(cc_model)'

y, x = patsy.dmatrices(formula, train, return_type = 'dataframe')

weights = train['coll_eu']

lost_cost_model = smf.GLM(y, x-1, family = sm.families.Tweedie(link = sm.families.links.log, var_power = 1.5), weights = weights)

lost_cost_results = lost_cost_model.fit()

其他信息：

在以下行抛出内存错误：

lost_cost_results = lost_cost_model.fit()

以下是追溯：

MemoryError跟踪（最近的调用）   最后）   ----> 1个Lost_cost_results = lost_cost_model.fit（）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ genmod \ generalized_linear_model.py   适合（自我，start_params，maxiter，方法，tol，比例，cov_type，   cov_kwds，use_t，full_output，disp，max_start_irls，** kwargs）1010   返回self._fit_irls（start_params = start_params，maxiter = maxiter，
  1011 tol = tol，scale = scale，   cov_type = cov_type，   -> 1012 cov_kwds = cov_kwds，use_t = use_t，** kwargs）1013 else：1014
  self._optim_hessian = kwargs.get（'optim_hessian'）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ genmod \ generalized_linear_model.py   在_fit_irls中（自我，start_params，maxiter，tol，scale，cov_type，   cov_kwds，use_t，** kwargs）1131 wlsendog，
  1132 wlsexog，   -> 1133 self.weights）.fit（method = wls_method）1134 lin_pred = np.dot（self.exog，wls_results.params）
  1135 lin_pred + = self._offset_exposure

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ statsmodels \ regression_tools.py   在初始中（自我，内向性，外向性，体重）        47 self.wexog = w_half * exog        其他48个：   ---> 49 self.wexog = w_half [:, None] * exog        50        51 def fit（self，method ='pinv'）：

MemoryError：

添加2：

在保险业中，标准做法是将所有内容都归为分类变量。然后，一旦决定了如何使每个相对论平滑，就将需要更改的内容更改为数值类型，并拟合多项式或样条曲线等等。

一旦我使用了我可以作为数值的变量，它就可以运行了……仅3分钟。案件结案。

为什么我的python statsmodels出现内存错误？

0 个答案: