带递归SQL的简单代数

时间:2017-09-14 20:27:31

标签: sql recursion mariadb linear-algebra recursive-query

以下架构用于创建简单的代数公式。 variables用于创建x=3+4y等公式。 variables_has_sub_variables用于合并前面提到的公式,并使用sign列(仅限+1或-1)来确定是否应将公式添加或减去组合。

enter image description here

例如,variables表可能包含以下数据:Implied Formulas列实际上不在表中,但仅用于说明目的。

变量

+-----------+-----------+-------+------------------+
| variables | intercept | slope | Implied Formula  |
+-----------+-----------+-------+------------------+
|         1 |      2.86 | -0.82 | Y1=+2.86-0.82*X1 |
|         2 |      2.96 | -3.49 | Y2=+2.96-3.49*X2 |
|         3 |      2.56 |  2.81 | Y3=+2.56+2.81*X3 |
|         4 |      3.04 | -3.43 | Y4=+3.04-3.43*X4 |
|         5 |     -1.94 |  4.11 | Y5=-1.94+4.11*X5 |
|         6 |     -1.21 | -0.62 | Y6=-1.21-0.62*X6 |
|         7 |      0.88 | -0.61 | Y7=+0.88-0.61*X7 |
|         8 |     -2.77 | -0.34 | Y8=-2.77-0.34*X8 |
|         9 |      1.81 |  1.65 | Y9=+1.81+1.65*X9 |
+-----------+-----------+-------+------------------+

然后,根据以下variables_has_sub_variables数据,合并的变量会产生X7=+Y1-Y2+Y3X8=+Y4+Y5-Y7X9=+Y6-Y7+Y8。可以使用导致Y7等的Y8表格导出下一个Y9variablesY7=+0.88-0.61*X7。请注意,应用程序将阻止无限循环,例如插入记录variables等于7且sub_variables等于9,因为变量9基于变量7。

variables_has_sub_variables

+-----------+---------------+------+
| variables | sub_variables | sign |
+-----------+---------------+------+
|         7 |             1 |    1 |
|         7 |             2 |   -1 |
|         7 |             3 |    1 |
|         8 |             4 |    1 |
|         8 |             5 |    1 |
|         8 |             7 |   -1 |
|         9 |             6 |    1 |
|         9 |             7 |   -1 |
|         9 |             8 |    1 |
+-----------+---------------+------+

我的目标是给出任何变量(即1到9),确定常量和根变量,其中根变量被定义为不在variables_has_sub_variables.variables中(我也可以轻松地root列到variables如果需要),这些根变量包括使用我上面的示例数据的1到6。

对于根变量这样做更容易,因为没有sub_variables并且只是Y1=+2.86-0.82*X1

对变量7这样做有点棘手:

Y7=+0.88-0.61*X7
     =+0.88-0.61*(+Y1-Y2+Y3)
     =+0.88-0.61*(+(+2.86-0.82*X1)-(+2.96-3.49*X2)+( +2.56+2.81*X3))
     = -0.62 + 0.50*X1 - 2.13*X2 - 1.71*X3

现在是SQL。以下是我创建表格的方式:

CREATE DATABASE algebra;
USE algebra;

CREATE TABLE `variables` (
  `variables` INT NOT NULL,
  `slope` DECIMAL(6,2) NOT NULL DEFAULT 1,
  `intercept` DECIMAL(6,2) NOT NULL DEFAULT 0,
  PRIMARY KEY (`variables`))
ENGINE = InnoDB;

CREATE TABLE `variables_has_sub_variables` (
  `variables` INT NOT NULL,
  `sub_variables` INT NOT NULL,
  `sign` TINYINT NOT NULL,
  PRIMARY KEY (`variables`, `sub_variables`),
  INDEX `fk_variables_has_variables_variables1_idx` (`sub_variables` ASC),
  INDEX `fk_variables_has_variables_variables_idx` (`variables` ASC),
  CONSTRAINT `fk_variables_has_variables_variables`
    FOREIGN KEY (`variables`)
    REFERENCES `variables` (`variables`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_variables_has_variables_variables1`
    FOREIGN KEY (`sub_variables`)
    REFERENCES `variables` (`variables`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

INSERT INTO variables(variables,intercept,slope) VALUES (1,2.86,-0.82),(2,2.96,-3.49),(3,2.56,2.81),(4,3.04,-3.43),(5,-1.94,4.11),(6,-1.21,-0.62),(7,0.88,-0.61),(8,-2.77,-0.34),(9,1.81,1.65);

INSERT INTO variables_has_sub_variables(variables,sub_variables,sign) VALUES (7,1,1),(7,2,-1),(7,3,1),(8,4,1),(8,5,1),(8,7,-1),(9,6,1),(9,7,-1),(9,8,1);

现在查询。对于以下结果,XXXX为7,8和9。在每次查询之前,我都会显示我的预期结果。

WITH RECURSIVE t AS (
SELECT v.variables, v.slope, v.intercept
FROM variables v
WHERE v.variables=XXXX
UNION ALL
SELECT v.variables, vhsv.sign*t.slope*v.slope slope, vhsv.sign*t.slope*v.intercept intercept
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables=t.variables
INNER JOIN variables v ON v.variables=vhsv.sub_variables
)
SELECT variables, SUM(slope) constant FROM t GROUP BY variables
UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t;

变量7所需

+-----------+----------+
| variables | constant |
+-----------+----------+
|         1 |     0.50 |
|         2 |    -2.13 |
|         3 |    -1.71 |
| intercept |  -0.6206 |
+-----------+----------+

变量7实际

+-----------+----------+
| variables | constant |
+-----------+----------+
| 1         | 0.50     |
| 2         | -2.13    |
| 3         | -1.71    |
| 7         | -0.61    |
| intercept | -0.61    |
+-----------+----------+
5 rows in set (0.00 sec)

变量8所需

+-----------+-----------+
| variables | constant  |
+-----------+-----------+
|         1 |      0.17 |
|         2 |     -0.72 |
|         3 |     -0.58 |
|         4 |      1.17 |
|         5 |     -1.40 |
| intercept | -3.355004 |
+-----------+-----------+

变量8实际

+-----------+----------+
| variables | constant |
+-----------+----------+
| 1         | 0.17     |
| 2         | -0.73    |
| 3         | -0.59    |
| 4         | 1.17     |
| 5         | -1.40    |
| 7         | -0.21    |
| 8         | -0.34    |
| intercept | -3.36    |
+-----------+----------+
8 rows in set (0.00 sec)

变量9所需

+-----------+------------+
| variables |  constant  |
+-----------+------------+
|         1 |      -0.54 |
|         2 |       2.32 |
|         3 |       1.87 |
|         4 |       1.92 |
|         5 |      -2.31 |
|         6 |      -1.02 |
| intercept | -4.6982666 |
+-----------+------------+

变量9实际

+-----------+----------+
| variables | constant |
+-----------+----------+
| 1         | -0.55    |
| 2         | 2.33     |
| 3         | 1.88     |
| 4         | 1.92     |
| 5         | -2.30    |
| 6         | -1.02    |
| 7         | 0.67     |
| 8         | -0.56    |
| 9         | 1.65     |
| intercept | -4.67    |
+-----------+----------+
10 rows in set (0.00 sec)

我需要做的就是检测哪些变量不是根变量并将其过滤掉。该如何实现?

回应JNevill的回答: 对于v.variables 9

+-----------+-------+-------+----------+
| variables | depth | path  | constant |
+-----------+-------+-------+----------+
| 1         |     3 | 9>7>1 | -0.55    |
| 2         |     3 | 9>7>2 | 2.33     |
| 3         |     3 | 9>7>3 | 1.88     |
| 4         |     3 | 9>8>4 | 1.92     |
| 5         |     3 | 9>8>5 | -2.30    |
| 6         |     2 | 9>6   | -1.02    |
| 7         |     2 | 9>7   | 0.67     |
| 8         |     2 | 9>8   | -0.56    |
| 9         |     1 | 9     | 1.65     |
| intercept |     1 | 9     | -4.67    |
+-----------+-------+-------+----------+
10 rows in set (0.00 sec)

1 个答案:

答案 0 :(得分:3)

我不会试图完全围绕你正在做的事情,我同意@RickJames在评论中认为这可能不是数据库的最佳用例。我虽然有点痴迷。我明白了。

我几乎总是在递归CTE中跟踪几件事。

  1. " Path"。如果我要让一个查询停在一个兔子洞,我想知道它是如何到达终点的。所以我跟踪路径,以便知道每次迭代选择了哪个主键。在递归种子(顶部)中,我使用类似SELECT CAST(id as varchar(500)) as path...的东西,在递归成员(底部)中,我执行recursiveCTE.path + '>' + id as path...

  2. 之类的操作
  3. "深度"。我想知道迭代到达结果记录的深度。通过将SELECT 1 as depth添加到递归种子并将recursiveCTE + 1 as depth添加到递归成员来跟踪此情况。现在我知道每条记录的深度。

  4. 我相信2号将解决您的问题:

    WITH RECURSIVE t
    AS (
        SELECT v.variables,
            v.slope,
            v.intercept,
            1 as depth
        FROM variables v
        WHERE v.variables = XXXX
    
        UNION ALL
    
        SELECT v.variables,
            vhsv.sign * t.slope * v.slope slope,
            vhsv.sign * t.slope * v.intercept intercept, 
            t.depth + 1
        FROM t
        INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
        INNER JOIN variables v ON v.variables = vhsv.sub_variables
        )
    SELECT variables,
        SUM(slope) constant
    FROM t
    WHERE depth > 1
    GROUP BY variables
    
    UNION
    
    SELECT 'intercept' variables,
        SUM(intercept) intercept
    FROM t;
    

    这里的WHERE子句将限制递归结果集中深度为1的记录,这意味着它们是从递归CTE的递归种子部分引入的(它们是根)。

    如果您要求从您的t CTE的第二个UNION中移除根,则不清楚。如果是这样,则适用相同的逻辑;只需抛出WHERE子句就可以限制depth

    1条记录

    虽然这里可能没什么用处,但是PATH递归cte的示例是:

    WITH RECURSIVE t
    AS (
        SELECT v.variables,
            v.slope,
            v.intercept,
            1 as depth,
            CAST(v.variables as CHAR(30)) as path
        FROM variables v
        WHERE v.variables = XXXX
    
        UNION ALL
    
        SELECT v.variables,
            vhsv.sign * t.slope * v.slope slope,
            vhsv.sign * t.slope * v.intercept intercept, 
            t.depth + 1,
            CONCAT(t.path,'>', v.variables)
        FROM t
        INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
        INNER JOIN variables v ON v.variables = vhsv.sub_variables
        )
    SELECT variables,
        SUM(slope) constant
    FROM t
    WHERE depth > 1
    GROUP BY variables
    
    UNION
    
    SELECT 'intercept' variables,
        SUM(intercept) intercept
    FROM t;