我正在尝试使用分类变量创建回归。
我首先获取所有虚拟变量。并将不需要的所有内容都放在
的x值中d1 = pd.get_dummies(df2015 ["CBSA Office"])
df2015_new = pd.concat([df2015, d1], axis=1)
d2 = pd.get_dummies(df2016 ["CBSA Office"])
df2016_new = pd.concat([df2016, d2], axis=1)
trainset = pd.concat([df2015_new,df2016_new],axis=0)
trainset = trainset.dropna()
x_train = trainset.drop(['CBSA Office','Location','Updated','Commercial Flow','Travellers Flow'],axis="columns")
y_train = trainset["Travellers Flow"]
现在我正在使用OLS函数运行回归。
x_train = x_train.iloc[:100].values.reshape(-1,1)
y_train = y_train.iloc[:100].values.reshape(-1,1)
modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
modelx.summary()
然后我会收到一条错误消息
endog and exog matrices are different sizes
但是我想我已经设置了相同的大小
如果不重塑它们,我会得到这样的结果
C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py:1554: RuntimeWarning: invalid value encountered in double_scalars
return self.ess/self.df_model
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\base\model.py:1100: RuntimeWarning: invalid value encountered in true_divide
return self.params / self.bse
OLS Regression Results
Dep. Variable: Travellers Flow R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: nan
Date: Sun, 09 Dec 2018 Prob (F-statistic): nan
Time: 00:34:01 Log-Likelihood: -429.08
No. Observations: 100 AIC: 860.2
Df Residuals: 99 BIC: 862.8
Df Model: 0
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Abbotsford-Huntingdon 8.5000 1.776 4.786 0.000 4.976 12.024
Aldergrove 0 0 nan nan 0 0
Ambassador Bridge 0 0 nan nan 0 0
Blue Water Bridge 0 0 nan nan 0 0
Boundary Bay 0 0 nan nan 0 0
Cornwall 0 0 nan nan 0 0
Coutts 0 0 nan nan 0 0
Douglas (Peace Arch) 0 0 nan nan 0 0
Edmundston 0 0 nan nan 0 0
Emerson 0 0 nan nan 0 0
Fort Frances Bridge 0 0 nan nan 0 0
North Portal 0 0 nan nan 0 0
Pacific Highway 0 0 nan nan 0 0
Peace Bridge 0 0 nan nan 0 0
Prescott 0 0 nan nan 0 0
Queenston-Lewiston Bridge 0 0 nan nan 0 0
Rainbow Bridge 0 0 nan nan 0 0
Sault Ste. Marie 0 0 nan nan 0 0
St-Armand/Philipsburg 0 0 nan nan 0 0
St-Bernard-de-Lacolle 0 0 nan nan 0 0
St. Stephen 0 0 nan nan 0 0
St. Stephen 3rd Bridge 0 0 nan nan 0 0
Stanstead 0 0 nan nan 0 0
Thousand Islands Bridge 0 0 nan nan 0 0
Windsor and Detroit Tunnel 0 0 nan nan 0 0
Woodstock Road 0 0 nan nan 0 0
Omnibus: 81.245 Durbin-Watson: 0.324
Prob(Omnibus): 0.000 Jarque-Bera (JB): 453.220
Skew: 2.832 Prob(JB): 3.84e-99
Kurtosis: 11.757 Cond. No. 1.00e+16
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.98e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
这是我想要的格式,其中包含所有虚拟变量,但是有很多警告,R ^ 2为0,并且我肯定不能以此为基础进行预测。
我想要的是一个总结,其中包括每个虚拟变量
我试图这样做
x_train = np.array(x_train).reshape(1,-1)
y_train = np.array(y_train).reshape(1,-1)
modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
modelx.summary()
我会得到
MemoryError Traceback (most recent call last)
<ipython-input-668-312de7f7e808> in <module>()
1 x_train = np.array(x_train).reshape(1,-1)
2 y_train = np.array(y_train).reshape(1,-1)
----> 3 modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
4 modelx.summary()
~\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)
273 self.pinv_wexog, singular_values = pinv_extended(self.wexog)
274 self.normalized_cov_params = np.dot(
--> 275 self.pinv_wexog, np.transpose(self.pinv_wexog))
276
277 # Cache these singular values for use later.
MemoryError:
我是python的新手,需要很多帮助,谢谢!