我正在尝试使用多元线性回归计算系数。我正在使用statsmodels
库来计算系数。问题是,使用此代码,我收到错误ValueError: endog and exog matrices are different sizes
。我得到了这个错误,因为在这个例子中y
集合有4个元素,而X
集合有一个包含7个ndarrays的列表,其中每个列表有5个元素。
但我不明白的是,x
集(不是X
)是一个列表,里面有4个列表(y
有4个元素),其中每个list由7个变量组成。对我而言,x
和y
具有相同数量的元素。
如何解决此错误?
import numpy as np
import statsmodels.api as sm
def test_linear_regression():
x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]]
y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]
reg_m(y, x)
def reg_m(y, x):
ones = np.ones(len(x[0]))
X = sm.add_constant(np.column_stack((x[0], ones)))
y.append(1)
for ele in x[1:]:
X = sm.add_constant(np.column_stack((ele, X)))
results = sm.OLS(y, X).fit()
return results
if __name__ == "__main__":
test_linear_regression()
答案 0 :(得分:1)
假设x
中的每个列表都对应y
的每个值:
x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]
]
y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]
def reg_m(x, y):
x = np.array(x)
y = np.array(y)
# adds a constant of ones for y intercept
X = np.insert(x, 0, np.ones((1,)), axis=1)
# or, if you REALLY want to use add_constant, to add ones, use this
# X = sm.add_constant(x, has_constant='add')
return sm.OLS(y, X).fit()
model = reg_m(x, y)
要查看模型的摘要打印输出,只需model.summary()
"""
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.450
Model: OLS Adj. R-squared: -0.649
Method: Least Squares F-statistic: 0.4096
Date: Thu, 07 Jul 2016 Prob (F-statistic): 0.741
Time: 21:50:12 Log-Likelihood: -14.665
No. Observations: 4 AIC: 35.33
Df Residuals: 1 BIC: 33.49
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const -1.306e-07 2.18e-07 -0.599 0.657 -2.9e-06 2.64e-06
x1 -3.086e-11 5.15e-11 -0.599 0.657 -6.86e-10 6.24e-10
x2 -0.0001 0.000 -0.900 0.534 -0.002 0.002
x3 0.0031 0.003 0.900 0.534 -0.041 0.047
x4 16.0236 26.761 0.599 0.657 -324.006 356.053
x5 8.321e-12 9.25e-12 0.900 0.534 -1.09e-10 1.26e-10
x6 1.331e-07 1.48e-07 0.900 0.534 -1.75e-06 2.01e-06
x7 0.0002 0.000 0.900 0.534 -0.003 0.003
==============================================================================
Omnibus: nan Durbin-Watson: 1.500
Prob(Omnibus): nan Jarque-Bera (JB): 0.167
Skew: -0.000 Prob(JB): 0.920
Kurtosis: 2.000 Cond. No. inf
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
[3] The smallest eigenvalue is 0. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
"""