我正在尝试在C#中执行偏最小二乘回归分析。在MATLAB中执行的pls技术使用SIMPLS算法,该算法提供β(回归系数矩阵)。
我不明白为什么两种情况下的矩阵都不同,我将输入传递给C#版本的方式有误吗?
此外,两者的输入相同,并参考此处包含的论文。
最小工作示例 :
MATLAB :遵循HervéAbdi(HervéAbdi,Partial Least Square Regression)的小例子。参考文献:PDF
clear all;
clc;
inputs = [7, 7, 13, 7; 4, 3, 14, 7; 10, 5, 12, 5; 16, 7, 11, 3; 13, 3, 10, 3];
outputs = [14, 7, 8; 10, 7, 6; 8, 5, 5; 2, 4,7; 6, 2, 4];
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(inputs,outputs, 1);
disp 'beta'
beta
disp 'beta size'
size(beta)
yfit = [ones(size(inputs,1),1) inputs]*beta;
residuals = outputs - yfit;
% stem(residuals)
% xlabel('Observation');
% ylabel('Residual');
beta =
1.0484e+01 6.1899e+00 6.2841e+00
-6.3488e-01 -3.0405e-01 -7.2608e-02
2.1949e-02 1.0512e-02 2.5102e-03
1.9226e-01 9.2078e-02 2.1988e-02
2.8948e-01 1.3864e-01 3.3107e-02
Accord.NET:
double[][] inputs = new double[][]
{
// Wine | Price | Sugar | Alcohol | Acidity
new double[] { 7, 7, 13, 7 },
new double[] { 4, 3, 14, 7 },
new double[] { 10, 5, 12, 5 },
new double[] { 16, 7, 11, 3 },
new double[] { 13, 3, 10, 3 },
};
double[][] outputs = new double[][]
{
// Wine | Hedonic | Goes with meat | Goes with dessert
new double[] { 14, 7, 8 },
new double[] { 10, 7, 6 },
new double[] { 8, 5, 5 },
new double[] { 2, 4, 7 },
new double[] { 6, 2, 4 },
};
var pls = new PartialLeastSquaresAnalysis()
{
Method = AnalysisMethod.Center,
Algorithm = PartialLeastSquaresAlgorithm.NIPALS
};
var regression = pls.Learn(inputs, outputs);
double[][] coeffs = regression.Weights;
>>
-1.69811320754717 -0.0566037735849056 0.0707547169811322
1.27358490566038 0.29245283018868 0.571933962264151
-4 1 0.5
1.17924528301887 0.122641509433962 0.159198113207547
答案 0 :(得分:1)
我认为在调用MATLAB和Accord.NET版本的PLS之间至少有三个不一致。
如您所述,MATLAB正在使用SIMPLS。但是,Accord.NET被告知要使用NIPALS。
MATLAB版本被称为 plsregress(输入,输出, 1 ),这意味着回归的计算仅考虑PLS中的1个潜在成分,但是你没有指示Accord.NET也这样做。
Accord.NET返回一个MultivariateLinearRegression对象,该对象包含权重矩阵和截距矢量,而MATLAB将截距作为权重矩阵的第一列返回。
考虑到所有这些因素后,可以生成与MATLAB版本完全相同的结果:
double[][] inputs = new double[][]
{
// Wine | Price | Sugar | Alcohol | Acidity
new double[] { 7, 7, 13, 7 },
new double[] { 4, 3, 14, 7 },
new double[] { 10, 5, 12, 5 },
new double[] { 16, 7, 11, 3 },
new double[] { 13, 3, 10, 3 },
};
double[][] outputs = new double[][]
{
// Wine | Hedonic | Goes with meat | Goes with dessert
new double[] { 14, 7, 8 },
new double[] { 10, 7, 6 },
new double[] { 8, 5, 5 },
new double[] { 2, 4, 7 },
new double[] { 6, 2, 4 },
};
// Create the Partial Least Squares Analysis
var pls = new PartialLeastSquaresAnalysis()
{
Method = AnalysisMethod.Center,
Algorithm = PartialLeastSquaresAlgorithm.SIMPLS, // First change: use SIMPLS
};
// Learn the analysis
pls.Learn(inputs, outputs);
// Second change: Use just 1 latent factor/component
var regression = pls.CreateRegression(factors: 1);
// Third change: present results as in MATLAB
double[][] w = regression.Weights.Transpose();
double[] b = regression.Intercepts;
// Add the intercepts as the first column of the matrix of
// weights and transpose it as in the way MATLAB presents it
double[][] coeffs = (w.InsertColumn(b, index: 0)).Transpose();
// Show results in MATLAB format
string str = coeffs.ToOctave();
通过这些更改,上面的coeffs矩阵应该变为
[ 10.4844779770616 6.18986077674717 6.28413863347486 ;
-0.634878923091644 -0.304054829845448 -0.0726082626993539 ;
0.0219492754418065 0.0105118991463605 0.00251024045589416 ;
0.192261724966225 0.0920775662006966 0.0219881135215502 ;
0.289484835410222 0.13863944631343 0.033107085796122 ]