Question

我尝试对斜率绘制线性回归，其可信度为95％，发现两种方法的结果不同。

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table

x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
res = sm.OLS(y, x).fit()

st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T

plt.figure(1)
plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(x, fittedvalues, 'r-', label='OLS')
plt.plot(x, predict_ci_low, 'b--')
plt.plot(x, predict_ci_upp, 'b--')
plt.plot(x, predict_mean_ci_low, 'g--')
plt.plot(x, predict_mean_ci_upp, 'g--')
plt.legend()

plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()

顺便说一下，predict_mean_ci_low和predict_ci_low有什么区别？我在手册中找不到它的解释。 statsmodels部分是从此question复制的。

编辑：

根据Josef，Warren Weckesser和this post，我需要为OLS版本添加一个常量。

默认情况下，statsmodels中的OLS在线性方程式中不包含常数项（即截距）。（常数项对应于设计矩阵中的一列。）

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table

x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
X = sm.add_constant(x)
res = sm.OLS(y, X).fit()

st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T

plt.figure(1)

plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(X, fittedvalues, 'r-', label='OLS')
plt.plot(X, predict_ci_low, 'b--')
plt.plot(X, predict_ci_upp, 'b--')
plt.plot(X, predict_mean_ci_low, 'g--')
plt.plot(X, predict_mean_ci_upp, 'g--')
plt.legend()

plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()

但是，现在该情节看起来很奇怪。出现一些奇数行和多余的图例。

编辑2：

预测间隔与置信区间之间的差异，请参阅此web。

Edit3：

exog包含我要包含在模型中的所有变量，包括一个常量（一列的常量）。因此，我们需要使用X [：，1]而不使用一列进行打印。

ax.plot(X[:,1], fittedvalues, 'r-', label='OLS')
ax.plot(X[:,1], predict_ci_low, 'b--')
ax.plot(X[:,1], predict_ci_upp, 'b--')
ax.plot(X[:,1], predict_mean_ci_low, 'g--')
ax.plot(X[:,1], predict_mean_ci_upp, 'g--')

seaborn模型和stats模型的不同结果

0 个答案: