使用Apache数学进行权重线性回归

时间:2019-01-03 17:07:38

标签: java linear-regression apache-commons-math

一段时间以来,我一直在使用Apache数学来使用OLSMultipleLinearRegression进行多元线性回归。现在,我需要扩展解决方案,以包括每个数据点的加权因子。

我正在尝试复制MATLAB函数fitlm。

我有一个MATLAB调用,例如:

table_data = table(points_scored, height, weight, age);
model = fitlm( table_data, 'points_scored ~ -1, height, weight, age', 'Weights', data_weights)

从“模型”中,我获得了身高,体重,年龄的回归系数。

在Java中,我现在拥有的代码大致为:

double[][] variables = double[grades.length][3];
// Fill in variables for height, weight, age, 
...

OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(points_scored, variables);

似乎没有一种方法可以向OLSMultipleLinearRegression添加权重。确实有一种向LeastSquaresBuilder添加权重的方法。但是我在弄清楚确切如何使用它方面遇到了麻烦。我最大的问题(我认为)是创建预期的雅各布人。

这是我尝试的大部分内容:

double[] points_scored = //fill in points scored
double[] height = //fill in 
double[] weight = //fill in
double[] age = // fill in

MultivariateJacobianFunction distToResidual= coeffs -> {
  RealVector value = new ArrayRealVector(points_scored.length);
  RealMatrix jacobian = new Array2DRowRealMatrix(points_scored.length, 3);

  for (int i = 0; i < measures.length; ++i) {
    double residual = points_scored[i];
    residual -= coeffs.getEntry(0) * height[i];  
    residual -= coeffs.getEntry(1) * weight[i];  
    residual -= coeffs.getEntry(2) * age[i];  
    value.setEntry(i, residual);
    //No idea how to set up the jacobian here
   }

   return new Pair<RealVector, RealMatrix>(value, jacobian);
};

double[] prescribedDistancesToLine = new double[measures.length];
Arrays.fill(prescribedDistancesToLine, 0);
double[] starts = new double[] {1, 1, 1};

LeastSquaresProblem problem = new LeastSquaresBuilder().
            start(starts).
            model(distToResidual).
            target(prescribedDistancesToLine).
            lazyEvaluation(false).
            maxEvaluations(1000).
            maxIterations(1000).
            build();
 LeastSquaresOptimizer.Optimum optimum = new LevenbergMarquardtOptimizer().optimize(problem);

由于我不知道如何设置雅可比值,所以我一直在暗中刺伤,并且系数在MATLAB答案附近无处可寻。一旦我完成了这一部分的工作,我便知道在LeastSquaresBuilder中添加权重应该是一条相当简单的额外代码。

感谢您的任何帮助!

1 个答案:

答案 0 :(得分:0)

您可以使用Apache数学中的类GLSMultipleLinearRegression。 例如,让我们找到三个平面数据点的线性回归 (0,0),(1,2),(2,0)的权重为1,2,1:1:

data points and regression line

import org.apache.commons.math3.stat.regression.GLSMultipleLinearRegression;

public class Main {
    public static void main(String[] args) {
        GLSMultipleLinearRegression regr = new GLSMultipleLinearRegression();
        regr.setNoIntercept(false);
        double[] y = new double[]{0.0, 2.0, 0.0};
        double[][] x = new double[3][];
        x[0] = new double[]{0.0};
        x[1] = new double[]{1.0};
        x[2] = new double[]{2.0};
        double[][] omega = new double[3][];
        omega[0] = new double[]{1.0, 0.0, 0.0};
        omega[1] = new double[]{0.0, 0.5, 0.0};
        omega[2] = new double[]{0.0, 0.0, 1.0};
        regr.newSampleData(y, x, omega);
        double[] params = regr.estimateRegressionParameters();
        System.out.println("Slope: " + params[1] + ", intercept: " + params[0]);
    }
}

请注意,omega矩阵是对角线,其对角线元素是倒数。

查看documentation以了解多变量情况。