一段时间以来,我一直在使用Apache数学来使用OLSMultipleLinearRegression进行多元线性回归。现在,我需要扩展解决方案,以包括每个数据点的加权因子。
我正在尝试复制MATLAB函数fitlm。
我有一个MATLAB调用,例如:
table_data = table(points_scored, height, weight, age);
model = fitlm( table_data, 'points_scored ~ -1, height, weight, age', 'Weights', data_weights)
从“模型”中,我获得了身高,体重,年龄的回归系数。
在Java中,我现在拥有的代码大致为:
double[][] variables = double[grades.length][3];
// Fill in variables for height, weight, age,
...
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(points_scored, variables);
似乎没有一种方法可以向OLSMultipleLinearRegression添加权重。确实有一种向LeastSquaresBuilder添加权重的方法。但是我在弄清楚确切如何使用它方面遇到了麻烦。我最大的问题(我认为)是创建预期的雅各布人。
这是我尝试的大部分内容:
double[] points_scored = //fill in points scored
double[] height = //fill in
double[] weight = //fill in
double[] age = // fill in
MultivariateJacobianFunction distToResidual= coeffs -> {
RealVector value = new ArrayRealVector(points_scored.length);
RealMatrix jacobian = new Array2DRowRealMatrix(points_scored.length, 3);
for (int i = 0; i < measures.length; ++i) {
double residual = points_scored[i];
residual -= coeffs.getEntry(0) * height[i];
residual -= coeffs.getEntry(1) * weight[i];
residual -= coeffs.getEntry(2) * age[i];
value.setEntry(i, residual);
//No idea how to set up the jacobian here
}
return new Pair<RealVector, RealMatrix>(value, jacobian);
};
double[] prescribedDistancesToLine = new double[measures.length];
Arrays.fill(prescribedDistancesToLine, 0);
double[] starts = new double[] {1, 1, 1};
LeastSquaresProblem problem = new LeastSquaresBuilder().
start(starts).
model(distToResidual).
target(prescribedDistancesToLine).
lazyEvaluation(false).
maxEvaluations(1000).
maxIterations(1000).
build();
LeastSquaresOptimizer.Optimum optimum = new LevenbergMarquardtOptimizer().optimize(problem);
由于我不知道如何设置雅可比值,所以我一直在暗中刺伤,并且系数在MATLAB答案附近无处可寻。一旦我完成了这一部分的工作,我便知道在LeastSquaresBuilder中添加权重应该是一条相当简单的额外代码。
感谢您的任何帮助!
答案 0 :(得分:0)
您可以使用Apache数学中的类GLSMultipleLinearRegression。 例如,让我们找到三个平面数据点的线性回归 (0,0),(1,2),(2,0)的权重为1,2,1:1:
import org.apache.commons.math3.stat.regression.GLSMultipleLinearRegression;
public class Main {
public static void main(String[] args) {
GLSMultipleLinearRegression regr = new GLSMultipleLinearRegression();
regr.setNoIntercept(false);
double[] y = new double[]{0.0, 2.0, 0.0};
double[][] x = new double[3][];
x[0] = new double[]{0.0};
x[1] = new double[]{1.0};
x[2] = new double[]{2.0};
double[][] omega = new double[3][];
omega[0] = new double[]{1.0, 0.0, 0.0};
omega[1] = new double[]{0.0, 0.5, 0.0};
omega[2] = new double[]{0.0, 0.0, 1.0};
regr.newSampleData(y, x, omega);
double[] params = regr.estimateRegressionParameters();
System.out.println("Slope: " + params[1] + ", intercept: " + params[0]);
}
}
请注意,omega
矩阵是对角线,其对角线元素是倒数。
查看documentation以了解多变量情况。