Mysql多变量线性回归

时间:2010-01-17 22:17:15

标签: mysql matrix regression linear-regression

我正在尝试对我的mysql 5.0数据库中的数据进行多变量(9变量)线性回归(结果值字段只有2个可能的值,1和0)。

我做了一些搜索,发现我可以使用:

mysql> SELECT
    -> @n := COUNT(score) AS N,
    -> @meanX := AVG(age) AS "X mean",
    -> @sumX := SUM(age) AS "X sum",
    -> @sumXX := SUM(age*age) "X sum of squares",
    -> @meanY := AVG(score) AS "Y mean",
    -> @sumY := SUM(score) AS "Y sum",
    -> @sumYY := SUM(score*score) "Y sum of square",
    -> @sumXY := SUM(age*score) AS "X*Y sum"

要获得许多基本的回归变量,但我真的不想为9个变量的每个组合输入这样做。我可以找到关于如何对多变量进行回归的所有来源都需要矩阵运算。我可以使用mysql进行Matrix操作,还是有其他方法可以进行9变量线性回归?

我应该首先从mysql导出数据吗?它的行数约为80,000,因此可以移动它,只是不确定我应该使用什么。

谢谢, 丹

2 个答案:

答案 0 :(得分:1)

最好将这些数据存储在MySQL中,但您可以从有权访问数据库的语言处理数据。伪代码:

variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ];

for X in $variables do
    for Y in $variables do
        query = 'SELECT
            @'+$X+$Y+' := COUNT(score) AS '+$X+$Y+',
            @mean'+$X+' := AVG(age) AS "X mean",
            @sum'+$X+' := SUM(age) AS "X sum",
            @sum'+$X+$X+' := SUM(age*age) "X sum of squares",
            @mean'+$Y+' := AVG(score) AS "Y mean",
            @sum'+$Y+' := SUM(score) AS "Y sum",
            @sum'+$Y+$Y+' := SUM(score*score) "Y sum of square",
            @sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"';
        db_execute(query);
    done
done

但为什么不将结果存储在表格中?更适合数据库。

for X in $variables do
    for Y in $variables do
        query = 'INSERT INTO regression SELECT FROM measurements
            "'+$X+'" AS X
            "'+$Y+'" AS Y
            score AS valX
            age AS valY
            COUNT(score) AS N,
            AVG(age) AS meanX,
            SUM(age) AS sumX,
            SUM(age*age) squareX,
            AVG(score) AS meanY,
            SUM(score) AS sumY,
            SUM(score*score) squareY,
            SUM(age*score) AS sumXY';
        db_execute(query);
    done
done

在X列和Y列上放置单独的索引。

答案 1 :(得分:1)

我建议将数据移出MySQL并进入R.对于1/0响应数据,逻辑回归更合适,而不是您正在实现的简单平方和。

http://en.wikipedia.org/wiki/Logistic_regression

这似乎很好地展示了如何解决后勤问题

http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm#_Toc147483467