我正在尝试实现一个逻辑回归模型,但是当我尝试打印结果时,出现了一个错误,我已经查找并试图找出解决方法,但是还没有解决。
这是下面的样子:
#Columns
columns = new_df[['DIABETES_NO','DIABETES_INSULIN', 'DIABETES_NON-INSULIN', 'bmi_cat_0','bmi_cat_gte40','bmi_cat_lt40',
'albumin_cat_0', 'albumin_cat_gt3.5', 'albumin_cat_lt3.5', 'SMOKE_No', 'SMOKE_Yes',
'age_cat_0', 'age_cat_gte65', 'age_cat_lt65', 'SEX_male', 'SEX_female']]
#Model 1 Target Variable (Mortality)
X = columns
y = new_df['Mortality']
logit_model=sm.Logit (y,X)
result=logit_model.fit()
print(result.summary2())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.014645
Iterations: 35
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-35-0a3dafc9126f> in <module>
5
6 logit_model=sm.Logit (y,X)
----> 7 result=logit_model.fit()
8 print(result.summary2())
E:\Users\davidwool\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
1832 bnryfit = super(Logit, self).fit(start_params=start_params,
1833 method=method, maxiter=maxiter, full_output=full_output,
-> 1834 disp=disp, callback=callback, **kwargs)
1835
1836 discretefit = LogitResults(self, bnryfit)
E:\Users\davidwool\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
218 mlefit = super(DiscreteModel, self).fit(start_params=start_params,
219 method=method, maxiter=maxiter, full_output=full_output,
--> 220 disp=disp, callback=callback, **kwargs)
221
222 return mlefit # up to subclasses to wrap results
E:\Users\davidwool\Anaconda3\lib\site-packages\statsmodels\base\model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)
471 Hinv = cov_params_func(self, xopt, retvals)
472 elif method == 'newton' and full_output:
--> 473 Hinv = np.linalg.inv(-retvals['Hessian']) / nobs
474 elif not skip_hessian:
475 H = -1 * self.hessian(xopt)
E:\Users\davidwool\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in inv(a)
530 signature = 'D->D' if isComplexType(t) else 'd->d'
531 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 532 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
533 return wrap(ainv.astype(result_t, copy=False))
534
E:\Users\davidwool\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
87
88 def _raise_linalgerror_singular(err, flag):
---> 89 raise LinAlgError("Singular matrix")
90
91 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
我尝试将method ='bfgs'设置为,但除Coeff列以外的所有区域均显示NaN。
这是下面的样子:
#Model 1 Target Variable (Mortality)
X = columns
y = new_df['Mortality']
logit_model=sm.Logit (y,X)
result=logit_model.fit(method='bfgs')
print(result.summary2())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.014671
Iterations: 35
Function evaluations: 36
Gradient evaluations: 36
Results: Logit
=================================================================
Model: Logit Pseudo R-squared: 0.090
Dependent Variable: Mortality AIC: 329.5189
Date: 2020-10-15 19:32 BIC: 402.1568
No. Observations: 10549 Log-Likelihood: -154.76
Df Model: 9 LL-Null: -170.03
Df Residuals: 10539 LLR p-value: 0.00035468
Converged: 0.0000 Scale: 1.0000
-----------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-----------------------------------------------------------------
DIABETES_NO -1.3211 nan nan nan nan nan
DIABETES_INSULIN -0.1911 nan nan nan nan nan
DIABETES_NON-INSULIN -0.2797 nan nan nan nan nan
bmi_cat_0 -0.0321 nan nan nan nan nan
bmi_cat_gte40 -1.0971 nan nan nan nan nan
bmi_cat_lt40 -0.6626 nan nan nan nan nan
albumin_cat_0 -1.7288 nan nan nan nan nan
albumin_cat_gt3.5 -0.7371 nan nan nan nan nan
albumin_cat_lt3.5 0.6740 nan nan nan nan nan
SMOKE_No -1.0509 nan nan nan nan nan
SMOKE_Yes -0.7410 nan nan nan nan nan
age_cat_0 -0.0321 nan nan nan nan nan
age_cat_gte65 -0.0337 nan nan nan nan nan
age_cat_lt65 -1.7261 nan nan nan nan nan
SEX_male -1.2519 nan nan nan nan nan
SEX_female -0.5400 nan nan nan nan nan
=================================================================
任何帮助或建议将不胜感激,谢谢!!
答案 0 :(得分:0)
很显然,您不了解如何处理分类变量。每个分类变量都包含一整套的一键编码的虚拟变量(例如,一起包含SEX_male
和SEX_female
),本质上会在回归中引入多个常量,从而导致奇异的矩阵误差。