使用单个查询从SQL递归删除树中的节点

时间:2018-10-26 21:29:24

标签: sql postgresql tree

我正在使用PostgreSQL 9.5,并且有一个代表树的表:

CREATE TABLE tree (
    dependent CHAR(1) NOT NULL,
    prereq CHAR(1) NOT NULL,
    PRIMARY KEY (dependent, prereq),
    CHECK (dependent != prereq)
);

INSERT INTO tree VALUES
    ('B', 'A'),
    ('C', 'B'),
    ('F', 'D'),
    ('F', 'E'),
    ('G', 'E'),
    ('H', 'F'),
    ('H', 'G'),
    ('J', 'I'),
    ('K', 'I'),
    ('K', 'L'),
    ('N', 'J'),
    ('N', 'M'),
    ('P', 'O'),
    ('Q', 'P');

tree中的每一行都定义了dependent节点之间的边,该边取决于先决条件(prereq)节点。删除所有从属的先决条件后,该从属将不复存在。 (要明确,不允许循环。)我将没有任何先决条件的任何节点(仅依赖项)称为根节点

我正在寻找一个SQL查询,该查询给出了要删除的根节点列表,将产生从树中删除的完整节点集。我只会删除根节点。例如,如果我要删除根节点A,D,E和I,则要删除的完整节点集是A,B,C,D,E,F,G,H,I和J。对此的说明:

illustration of nodes to be deleted

阴影为红色的根节点位于要删除的节点的初始列表中。带有红色边框和字母的节点是由于删除所有先决条件节点而被删除的节点。

我已经很接近这个查询了:

WITH RECURSIVE deletion AS (
    SELECT
        tree.*
    FROM
        tree
    WHERE
        prereq IN ('A', 'D', 'E', 'I')
    UNION
    SELECT
        tree.*
    FROM
        deletion
        JOIN tree ON tree.prereq = deletion.dependent
)
SELECT prereq FROM deletion
UNION
SELECT dependent FROM deletion
ORDER BY 1

但是,这列出了太多要删除的节点:

 prereq 
--------
 A
 B
 C
 D
 E
 F
 G
 H
 I
 J
 K
 N
(12 rows)

K和N不应在列表中,因为它们都有不会删除的必备节点,分别是L和M。

什么是单个SQL查询?在给定初始根节点集的情况下,我可以在PostgreSQL 9.5中使用它来获取要删除的节点的完整列表?

对于它的价值,我的真实tree表大约有100,000行。

(我有一些想法我还无法完全实现,例如使用几个嵌套的反联接或以某种方式[ab]使用COUNT作为窗口函数,但我没有还没有破解,我希望社区可以提出一些更简单/优雅的方法。)

3 个答案:

答案 0 :(得分:1)

demo:db<>fiddle

WITH RECURSIVE dependents AS (                          -- 1.
    SELECT
        dependent,
        array_agg(prereq) as prereqs
    FROM 
        tree
    GROUP BY dependent
), deletions AS (
    SELECT array_cat(ARRAY['A', 'D', 'E', 'I'], array_agg(dependent))             -- 3.
    FROM dependents
    WHERE prereqs <@ ARRAY['A', 'D', 'E', 'I']          -- 2.

    UNION

    SELECT DISTINCT array_cat(del.array_cat, array_agg(dep.dependent) OVER ())
    FROM dependents dep
    JOIN deletions del
    ON NOT(dep.dependent = ANY(del.array_cat)) AND dep.prereqs <@ del.array_cat   -- 4.
)

SELECT * FROM deletions
  1. 获取每个受抚养人的所有直接先决条件
  2. 检查是否存在完全适合您删除数组的先决条件数组。
  3. 将这些结果和原始数组的所有相关项汇总为一个,作为删除节点的新数组。
  4. 递归部分:再次:检查是否有任何prereqs数组适合扩展的删除节点并将其添加到列表中的依赖项(新的,不在列表中)。

尽管我已经展示了一个单个递归查询的解决方案。但是我不确定是否可以在庞大而复杂的数据结构上很好地发挥作用。

我会尝试第二种方法来创建一个简单的函数(草图):

  1. 查找只有叶子且没有先决条件的所有元素。删除它们。
  2. 查找没有先决条件孩子的所有受抚养人。删除它们。
  3. 重复(2),直到没有元素包含空的先决条件。

答案 1 :(得分:1)

简单地说,您可以使用两个CTE(公用表表达式)来标识:

  • “候选节点”:这些是与根节点相关的所有节点,有可能被删除。
  • “受保护的节点”:这些是所有仍在起作用的与其他根节点相关的节点,并且不应删除。

同时获得这两个集合后,所需的结果是不是受保护节点的候选节点。查询看起来像:

with recursive cand as ( -- get the candidates nodes
  select distinct prereq as root, prereq as node, null as prereq
    from tree where prereq in ('A', 'D', 'E', 'I')
  union all
  select cand.root, t.dependent, t.prereq
    from cand
    join tree t on t.prereq = cand.node
),
prot as ( -- get the protected nodes
select distinct prereq as root, prereq as node, null as prereq
  from tree
  where prereq not in (select dependent from tree) 
    and prereq not in ('A', 'D', 'E', 'I')
  union all
  select prot.root, t.dependent, t.prereq
    from prot
    join tree t on t.prereq = prot.node
)
select distinct node -- choose candidates that are not protected
  from cand 
  where node not in (select node from prot)
  order by node

结果:

node  
----
A
B
C
D
E
F
G
H
I
J

现在,我再次看到它,我意识到对于候选节点,您可以使用完整表而不是树。如果需要,可以简化此查询的第一部分。

答案 2 :(得分:1)

有一种可能性:

WITH RECURSIVE
    candidate AS (
        -- All edges for initial nodes to delete.
        SELECT
            tree.dependent,
            tree.prereq
        FROM
            tree
        WHERE
            tree.prereq IN ('A', 'D', 'E', 'I')
        UNION ALL
        -- Iteratively add any edges where the prereq is already in
        -- the candidate deletion set.
        SELECT
            tree.dependent,
            tree.prereq
        FROM
            tree
            JOIN candidate ON
                candidate.dependent = tree.prereq
    ),
    survivor AS (
        -- Find all leaf nodes from the candidate set which can
        -- survive because they have at least one prerequisite node
        -- that is *not* in the candidate set.
        SELECT
            candidate1.dependent AS node
        FROM
            candidate AS candidate1
            JOIN tree
                ON candidate1.dependent = tree.dependent
                AND candidate1.prereq != tree.prereq
        WHERE
            NOT EXISTS (
                SELECT 1 FROM
                    candidate AS candidate2
                WHERE
                    candidate2.prereq = tree.prereq
            )
        UNION ALL
        -- Iteratively add any nodes from the candidate set which are
        -- dependent upon a node we've already identified as a
        -- survivor.
        SELECT
            candidate.dependent
        FROM
            candidate
            JOIN survivor ON survivor.node = candidate.prereq
    )
(
    -- The dependent column contains all nodes to delete except the
    -- initial list of nodes to delete (see below).
    SELECT dependent FROM candidate
    EXCEPT
    SELECT node FROM survivor
)
UNION ALL
-- Add in the initial set of nodes to delete.
SELECT * FROM (VALUES ('A'), ('D'), ('E'), ('I')) AS t
ORDER BY 1;

candidate CTE从tree产生了可能被删除的行的子集。 candidate.dependent成为要删除的候选节点的列表。然后通过首先查找survivor中命名的节点来构建candidate.dependent,这些节点与将要删除的节点至少有一条边,然后进行迭代(“递归”)根据先前在CTE中确定的幸存者节点,从candidate.dependent中命名越来越多的不会删除的节点。

使用看起来很奇怪的UNION ALL SELECT ... VALUES ...来代替此查询的输出中的初始节点列表,而不是使用(SELECT dependent FROM candidate UNION ALL SELECT prereq FROM candidate),后者似乎可以测量(但可能不会显着)慢。


编辑:这是上面的简化版。不幸的是,我认为它的运行速度稍慢一些,但我也认为它更易于阅读。

WITH RECURSIVE
    candidate AS (
        -- All initial nodes to delete.
        SELECT
            *
        FROM
            (VALUES ('A'), ('D'), ('E'), ('I')) AS t (node)
        UNION
        -- Iteratively add any nodes where the prereq is already in
        -- the candidate deletion set.
        SELECT
            tree.dependent
        FROM
            tree
            JOIN candidate ON
                candidate.node = tree.prereq
    ),
    survivor AS (
        -- Find all nodes from the candidate set which can
        -- survive because they have at least one prerequisite node
        -- that is *not* in the candidate set.
        SELECT
            c1.node
        FROM
            candidate AS c1
            JOIN tree
                ON c1.node = tree.dependent
            LEFT JOIN candidate AS c2 ON c2.node = tree.prereq
        WHERE
            c2.node IS NULL
        UNION
        -- Iteratively add any nodes from the candidate set which are
        -- dependent upon a node we've already identified as a
        -- survivor.
        SELECT
            candidate.node
        FROM
            candidate
            JOIN tree ON candidate.node = tree.dependent
            JOIN survivor ON survivor.node = tree.prereq
    )
SELECT node FROM candidate
EXCEPT ALL
SELECT node FROM survivor
ORDER BY 1