Question

我想知道如何从python statsmodels中的拟合逻辑回归模型中获得优势比。

>>> import statsmodels.api as sm
>>> import numpy as np
>>> X = np.random.normal(0, 1, (100, 3))
>>> y = np.random.choice([0, 1], 100)
>>> res = sm.Logit(y, X).fit()
Optimization terminated successfully.
         Current function value: 0.683158
         Iterations 4
>>> res.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                          Logit   Df Residuals:                       97
Method:                           MLE   Df Model:                            2
Date:                Sun, 05 Jun 2016   Pseudo R-squ.:                0.009835
Time:                        23:25:06   Log-Likelihood:                -68.316
converged:                       True   LL-Null:                       -68.994
                                        LLR p-value:                    0.5073
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0033      0.181     -0.018      0.985        -0.359     0.352
x2             0.0565      0.213      0.265      0.791        -0.362     0.475
x3             0.2985      0.216      1.380      0.168        -0.125     0.723
==============================================================================
"""
>>>

Answer 1

您可以通过以下方式获得赔率：

np.exp(res.params)

还要获得置信区间（source）：

params = res.params
conf = res.conf_int()
conf['OR'] = params
conf.columns = ['2.5%', '97.5%', 'OR']
print(np.exp(conf))

免责声明：我刚刚将评论整理到您的问题中。

Answer 2

不确定 statsmodels，在 sklearn 中做：

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

df=pd.DataFrame({'odds_ratio':(np.exp(logisticRegr.coef_).T).tolist(),'variable':x.columns.tolist()})
df['odds_ratio'] = df['odds_ratio'].str.get(0)

df=df.sort_values('odds_ratio', ascending=False)
df

statsmodels逻辑回归比值比

2 个答案: