您好,我是python的新手,我在多元回归模型内的多重共线性问题。
我有两个传送带,每个带一个,每个小时都有“负荷”,“速度”,“能量”等数据。我想了解能源表现。首先,我尝试了一个普通的普通最小二乘法来获得系数。但是我也看到传送带之间的系数不同。关键是,其中一条皮带要小几米,并且它必须将负载提高几米。我计算出的斜率为0.09。现在,我想获取有关它的信息。因此,我在每条皮带上都放了一个单独的专栏,并将其附加。我知道它是在进行一次Ridge回归的,当alpha为零时,我又有了 OLS 回归。但是我现在得到的系数令人惊讶。像之前一样,Load的影响很大,甚至达到了预期的新斜率,但是皮带的速度现在对Energy的性能产生了负面影响。很好,但不可能,当发动机转速增加时,能量减少……
我认为这可能是多重共线性的结果。所以我用了一个Correlation Matrix,但是Slope和Speed之间没有相关性。因此,我尝试做一个偏最小二乘,但是得到的系数接近于零,但另一方面, PLS 模型将给我X和Y_loading的值与我期望的系数一样。
我知道 PLS 通过y = x*coef +ERR
估算了Coef。
我想知道是否有可能获得ERR值?可能是ERR值太大而无法获得“良好”的系数吗? 通过PLS可以获得比OLS低得多的系数吗? PLS模型中的y_loadings值是什么? 还有其他模型可以用来检查能源绩效吗? 感谢您的帮助。
########## Partial Least Square Regression ######
PLSRegr = PLSRegression(n_components=2)
pls = PLSRegr.fit(X_train, Y_train)
pls_pred = pls.predict(X_test)
pls_meanSquaredError = mean_squared_error(Y_test, pls_pred)
print("PLS MSE:", pls_meanSquaredError)
pls_rootMeanSquaredError = sqrt(pls_meanSquaredError)
print("PLS RMSE:", pls_rootMeanSquaredError)
pls_mean = mean_absolute_error(Y_test, pls_pred)
print("PLS Mean_absolute Error:",pls_mean)
pls_r2 = r2_score(Y_test,pls_pred)
print("PLS R²", pls_r2)
print('PLS Coefficients: \n', PLSRegr.coef_)
print('PLS loadings: \n', PLSRegr.y_loadings_)
print('PLS loadings: \n', PLSRegr.x_loadings_)
##### Ridge Regression
n_alphas = 10
alphas = np.logspace(-1.5, 2.5, n_alphas)
coefs = []
errors = []
error_pred = []
Rsquared = []
Rsquared_pred = []
scores = []
p = 6 # Number of Predictors
N = 14266 # Total sample Size
for a in alphas:
ridge = KernelRidge(alpha=a, kernel='linear', coef0=0)
ridge.fit(X_train, Y_train)
KRR_pred = ridge.predict(X_train) # Prediction Train
rgr_pred = ridge.predict(X_test) # Prediction Test
print(KRR_pred)
print(ridge.dual_coef_)
print(np.dot(X_train.transpose(),ridge.dual_coef_))
coefs.append(np.dot(X_train.transpose(),ridge.dual_coef_))
Rsquared.append(ridge.score(X_train, Y_train))
print("R² of Trainset:",Rsquared)
Rsquared_pred.append(r2_score(Y_test,rgr_pred))
print("R² of Prediction:", Rsquared_pred)
Rsquaredadj = 1 - (((1-(r2_score(Y_test,rgr_pred)))*(N-1))/(N-p-1))
print("Adj R²",Rsquaredadj)
errors.append(mean_squared_error(ridge.dual_coef_,KRR_pred))
errors2.append(mean_squared_error(ridge.dual_coef_,rgr_pred))
print('MSE of bias:', errors)
error_pred.append(mean_squared_error(Y_test, rgr_pred))
print("RGR MSE:", error_pred)
mse = np.mean((rgr_pred - Y_test) ** 2)
print("MSE check", mse)
coefs = np.array(coefs)
coefs = coefs.reshape(n_alphas, 6)
print('Coefficients: \n', coefs)
print('Alphas: \n',alphas)
print(KRR_pred)
print(ridge.dual_coef_)
温度,载荷,张力,速度,坡度
PLS结果:0.00、0.11,-0.01、0.02、0.04
OLS / Ridge(Alpha = Zero)结果:-0.038,1.37,-0.067,-0.11,0.33
OLS结果无斜率:-0.011、1.11,-0.33、0.40
我期望without slope
这样的值,但在Ridge中具有较小的speed
系数,并且 PLS