需要SQL Server查询来解决三阶多项式回归问题

时间:2012-02-15 11:09:40

标签: sql math regression

任何人都可以帮助一些SQL查询代码来提供三阶多项式回归的系数估计吗?

请假设我有一个X和Y数据值表,想要估算a,b和c:

Y(X) = aX + bX^2 + cX^3 + E 

2 个答案:

答案 0 :(得分:4)

近似但快速解决方案是从数据中抽取4个代表点并求解这些点的多项式方程。

  1. 对于采样,您可以将数据拆分为相等的扇区并计算每个扇区的X和Y的平均值 - 可以使用X的四分位来完成拆分值,X值的平均值,min(x)+(max(x)-min(x))/4或您认为最合适的

    用四分位数来表示抽样(即按行号)illustration of solving 3rd order polynomial by sampling 4 points

  2. 至于求解,我使用numberempire.com来解决变量k,a,b,c的这些*方程:

    k + a*X1 + b*X1^2 + c*X1^3 - Y1 = 0,
    k + a*X2 + b*X2^2 + c*X2^3 - Y2 = 0,
    k + a*X3 + b*X3^2 + c*X3^3 - Y3 = 0,
    k + a*X4 + b*X4^2 + c*X4^3 - Y4 = 0
    

    *由于Y(X) = 0 + ax bx^2 + cx^3 + ϵ隐式包含[0,0]点作为采样点之一,因此会为不包含[0,0]的数据集创建错误的近似值。我冒昧地解决了Y(X) = k + ax bx^2 + cx^3 + ϵ

  3. 实际的SQL会是这样的:

    select
        -- returns 1 row with columns labeled K, A, B and C = coefficients in 3rd order polynomial equation for the 4 sample points
        -(X1*(X2p2*(X3p3*Y4-X4p3*Y3)+X2p3*(X4p2*Y3-X3p2*Y4)+(X3p2*X4p3-X3p3*X4p2)*Y2)+X1p2*(X2*(X4p3*Y3-X3p3*Y4)+X2p3*(X3*Y4-X4*Y3)+(X3p3*X4-X3*X4p3)*Y2)+X1p3*(X2*(X3p2*Y4-X4p2*Y3)+X2p2*(X4*Y3-X3*Y4)+(X3*X4p2-X3p2*X4)*Y2)+(X2*(X3p3*X4p2-X3p2*X4p3)+X2p2*(X3*X4p3-X3p3*X4)+X2p3*(X3p2*X4-X3*X4p2))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4))  as k,
        (X1p2*(X2p3*(Y4-Y3)-X3p3*Y4+X4p3*Y3+(X3p3-X4p3)*Y2)+X2p2*(X3p3*Y4-X4p3*Y3)+X1p3*(X3p2*Y4+X2p2*(Y3-Y4)-X4p2*Y3+(X4p2-X3p2)*Y2)+X2p3*(X4p2*Y3-X3p2*Y4)+(X3p2*X4p3-X3p3*X4p2)*Y2+(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4))  as a,
        -(X1*(X2p3*(Y4-Y3)-X3p3*Y4+X4p3*Y3+(X3p3-X4p3)*Y2)+X2*(X3p3*Y4-X4p3*Y3)+X1p3*(X3*Y4+X2*(Y3-Y4)-X4*Y3+(X4-X3)*Y2)+X2p3*(X4*Y3-X3*Y4)+(X3*X4p3-X3p3*X4)*Y2+(X2*(X4p3-X3p3)-X3*X4p3+X3p3*X4+X2p3*(X3-X4))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4))  as b,
        (X1*(X2p2*(Y4-Y3)-X3p2*Y4+X4p2*Y3+(X3p2-X4p2)*Y2)+X2*(X3p2*Y4-X4p2*Y3)+X1p2*(X3*Y4+X2*(Y3-Y4)-X4*Y3+(X4-X3)*Y2)+X2p2*(X4*Y3-X3*Y4)+(X3*X4p2-X3p2*X4)*Y2+(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4))  as c
      from (select
          samples.*,
          -- precomputing the powers should give better performance (at least i hope it would)
          power(X1,2) X1p2, power(X2,2) X2p2, power(X3,2) X3p2, power(X4,2) X4p2,
          power(Y1,3) Y1p3, power(Y2,3) Y2p3, power(Y3,3) Y3p3, power(Y4,3) Y4p3
        from (select
            avg(case when sector = 1 then x end) X1,
            avg(case when sector = 2 then x end) X2,
            avg(case when sector = 3 then x end) X3,
            avg(case when sector = 4 then x end) X4,
            avg(case when sector = 1 then y end) Y1,
            avg(case when sector = 2 then y end) Y2,
            avg(case when sector = 3 then y end) Y3,
            avg(case when sector = 4 then y end) Y4
          from (select x, y, 
              -- splitting to sectors 1 - 4 by row number (SQL Server version)
              ceiling(row_number() OVER (ORDER BY x asc) / count(*) * 4) sector
            from original_data
          )
        ) samples
      )
    

    根据developer.mimer.com,需要在SQL Server中启用这些可选功能:

    T611, "Elementary OLAP operations"
    F591, "Derived tables"
    

答案 1 :(得分:2)

SQL Server具有内置的排名功能NTILE(n),可以更轻松地创建您的扇区。我换了:

ceiling(row_number() OVER (ORDER BY x asc) / count(*) * 4) sector

使用:

NTILE(4) OVER(ORDER BY x ASC) [sector]

我还需要添加几个“预先计算的功率”以允许选择的完整列范围。完整列表如下所示:

POWER(samples.X1, 2) AS [X1p2], 
POWER(samples.X1, 3) AS [X1p3], 
POWER(samples.X2, 2) AS [X2p2], 
POWER(samples.X2, 3) AS [X2p3],
POWER(samples.X3, 2) AS [X3p2], 
POWER(samples.X3, 3) AS [X3p3], 
POWER(samples.X4, 2) AS [X4p2],
POWER(samples.X4, 3) AS [X4p3],
POWER(samples.Y1, 3) AS [Y1p3], 
POWER(samples.Y2, 3) AS [Y2p3], 
POWER(samples.Y3, 3) AS [Y3p3], 
POWER(samples.Y4, 3) AS [Y4p3]
总的来说,@ Aprillion的答案很棒!很好解释和numberempire.com h / t非常有帮助。