我正在尝试对数据框中的某些数据进行回归分析,但是我一直收到这种奇怪的形状错误。知道有什么问题吗?
import pandas as pd
import io
import requests
import statsmodels.api as sm
# Read in a dataset
url="https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv"
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))
# Select feature columns
X = df[['Body', 'Clean.Cup']]
# Select dv column
y = df['Cupper.Points']
# make model
mod = sm.OLS(X, y).fit()
我收到此错误: (1311,2)和(1311,2)形状不对齐:2(dim 1)!= 1311(dim 0)
答案 0 :(得分:0)
您的X
命令中的y
和sm.OLS
字词顺序错误:
import pandas as pd
import io
import requests
import statsmodels.api as sm
# Read in a dataset
url="https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv"
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))
# Select feature columns
X = df[['Body', 'Clean.Cup']]
# Select dv column
y = df['Cupper.Points']
# make model
mod = sm.OLS(y, X).fit()
mod.summary()
运行并返回
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Cupper.Points R-squared: 0.998
Model: OLS Adj. R-squared: 0.998
Method: Least Squares F-statistic: 3.145e+05
Date: Sat, 06 Jul 2019 Prob (F-statistic): 0.00
Time: 19:42:59 Log-Likelihood: -454.94
No. Observations: 1311 AIC: 913.9
Df Residuals: 1309 BIC: 924.2
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Body 0.8464 0.016 53.188 0.000 0.815 0.878
Clean.Cup 0.1154 0.012 9.502 0.000 0.092 0.139
==============================================================================
Omnibus: 537.879 Durbin-Watson: 1.710
Prob(Omnibus): 0.000 Jarque-Bera (JB): 30220.027
Skew: 1.094 Prob(JB): 0.00
Kurtosis: 26.419 Cond. No. 26.2
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""
答案 1 :(得分:0)
y和X的顺序错误。
sm.OLS(y,X)