我正在使用PostgreSQL 9.5,并且有一个代表树的表:
CREATE TABLE tree (
dependent CHAR(1) NOT NULL,
prereq CHAR(1) NOT NULL,
PRIMARY KEY (dependent, prereq),
CHECK (dependent != prereq)
);
INSERT INTO tree VALUES
('B', 'A'),
('C', 'B'),
('F', 'D'),
('F', 'E'),
('G', 'E'),
('H', 'F'),
('H', 'G'),
('J', 'I'),
('K', 'I'),
('K', 'L'),
('N', 'J'),
('N', 'M'),
('P', 'O'),
('Q', 'P');
tree
中的每一行都定义了dependent
节点之间的边,该边取决于先决条件(prereq
)节点。删除所有从属的先决条件后,该从属将不复存在。 (要明确,不允许循环。)我将没有任何先决条件的任何节点(仅依赖项)称为根节点。
我正在寻找一个SQL查询,该查询给出了要删除的根节点列表,将产生从树中删除的完整节点集。我只会删除根节点。例如,如果我要删除根节点A,D,E和I,则要删除的完整节点集是A,B,C,D,E,F,G,H,I和J。对此的说明:
阴影为红色的根节点位于要删除的节点的初始列表中。带有红色边框和字母的节点是由于删除所有先决条件节点而被删除的节点。
我已经很接近这个查询了:
WITH RECURSIVE deletion AS (
SELECT
tree.*
FROM
tree
WHERE
prereq IN ('A', 'D', 'E', 'I')
UNION
SELECT
tree.*
FROM
deletion
JOIN tree ON tree.prereq = deletion.dependent
)
SELECT prereq FROM deletion
UNION
SELECT dependent FROM deletion
ORDER BY 1
但是,这列出了太多要删除的节点:
prereq
--------
A
B
C
D
E
F
G
H
I
J
K
N
(12 rows)
K和N不应在列表中,因为它们都有不会删除的必备节点,分别是L和M。
什么是单个SQL查询?在给定初始根节点集的情况下,我可以在PostgreSQL 9.5中使用它来获取要删除的节点的完整列表?
对于它的价值,我的真实tree
表大约有100,000行。
(我有一些想法我还无法完全实现,例如使用几个嵌套的反联接或以某种方式[ab]使用COUNT
作为窗口函数,但我没有还没有破解,我希望社区可以提出一些更简单/优雅的方法。)
答案 0 :(得分:1)
WITH RECURSIVE dependents AS ( -- 1.
SELECT
dependent,
array_agg(prereq) as prereqs
FROM
tree
GROUP BY dependent
), deletions AS (
SELECT array_cat(ARRAY['A', 'D', 'E', 'I'], array_agg(dependent)) -- 3.
FROM dependents
WHERE prereqs <@ ARRAY['A', 'D', 'E', 'I'] -- 2.
UNION
SELECT DISTINCT array_cat(del.array_cat, array_agg(dep.dependent) OVER ())
FROM dependents dep
JOIN deletions del
ON NOT(dep.dependent = ANY(del.array_cat)) AND dep.prereqs <@ del.array_cat -- 4.
)
SELECT * FROM deletions
尽管我已经展示了一个单个递归查询的解决方案。但是我不确定是否可以在庞大而复杂的数据结构上很好地发挥作用。
我会尝试第二种方法来创建一个简单的函数(草图):
答案 1 :(得分:1)
简单地说,您可以使用两个CTE(公用表表达式)来标识:
同时获得这两个集合后,所需的结果是不是受保护节点的候选节点。查询看起来像:
with recursive cand as ( -- get the candidates nodes
select distinct prereq as root, prereq as node, null as prereq
from tree where prereq in ('A', 'D', 'E', 'I')
union all
select cand.root, t.dependent, t.prereq
from cand
join tree t on t.prereq = cand.node
),
prot as ( -- get the protected nodes
select distinct prereq as root, prereq as node, null as prereq
from tree
where prereq not in (select dependent from tree)
and prereq not in ('A', 'D', 'E', 'I')
union all
select prot.root, t.dependent, t.prereq
from prot
join tree t on t.prereq = prot.node
)
select distinct node -- choose candidates that are not protected
from cand
where node not in (select node from prot)
order by node
结果:
node
----
A
B
C
D
E
F
G
H
I
J
现在,我再次看到它,我意识到对于候选节点,您可以使用完整表而不是树。如果需要,可以简化此查询的第一部分。
答案 2 :(得分:1)
有一种可能性:
WITH RECURSIVE
candidate AS (
-- All edges for initial nodes to delete.
SELECT
tree.dependent,
tree.prereq
FROM
tree
WHERE
tree.prereq IN ('A', 'D', 'E', 'I')
UNION ALL
-- Iteratively add any edges where the prereq is already in
-- the candidate deletion set.
SELECT
tree.dependent,
tree.prereq
FROM
tree
JOIN candidate ON
candidate.dependent = tree.prereq
),
survivor AS (
-- Find all leaf nodes from the candidate set which can
-- survive because they have at least one prerequisite node
-- that is *not* in the candidate set.
SELECT
candidate1.dependent AS node
FROM
candidate AS candidate1
JOIN tree
ON candidate1.dependent = tree.dependent
AND candidate1.prereq != tree.prereq
WHERE
NOT EXISTS (
SELECT 1 FROM
candidate AS candidate2
WHERE
candidate2.prereq = tree.prereq
)
UNION ALL
-- Iteratively add any nodes from the candidate set which are
-- dependent upon a node we've already identified as a
-- survivor.
SELECT
candidate.dependent
FROM
candidate
JOIN survivor ON survivor.node = candidate.prereq
)
(
-- The dependent column contains all nodes to delete except the
-- initial list of nodes to delete (see below).
SELECT dependent FROM candidate
EXCEPT
SELECT node FROM survivor
)
UNION ALL
-- Add in the initial set of nodes to delete.
SELECT * FROM (VALUES ('A'), ('D'), ('E'), ('I')) AS t
ORDER BY 1;
candidate
CTE从tree
产生了可能被删除的行的子集。 candidate.dependent
成为要删除的候选节点的列表。然后通过首先查找survivor
中命名的节点来构建candidate.dependent
,这些节点与将要删除的节点至少有一条边,然后进行迭代(“递归”)根据先前在CTE中确定的幸存者节点,从candidate.dependent
中命名越来越多的不会删除的节点。
使用看起来很奇怪的UNION ALL SELECT ... VALUES ...
来代替此查询的输出中的初始节点列表,而不是使用(SELECT dependent FROM candidate UNION ALL SELECT prereq FROM candidate)
,后者似乎可以测量(但可能不会显着)慢。
编辑:这是上面的简化版。不幸的是,我认为它的运行速度稍慢一些,但我也认为它更易于阅读。
WITH RECURSIVE
candidate AS (
-- All initial nodes to delete.
SELECT
*
FROM
(VALUES ('A'), ('D'), ('E'), ('I')) AS t (node)
UNION
-- Iteratively add any nodes where the prereq is already in
-- the candidate deletion set.
SELECT
tree.dependent
FROM
tree
JOIN candidate ON
candidate.node = tree.prereq
),
survivor AS (
-- Find all nodes from the candidate set which can
-- survive because they have at least one prerequisite node
-- that is *not* in the candidate set.
SELECT
c1.node
FROM
candidate AS c1
JOIN tree
ON c1.node = tree.dependent
LEFT JOIN candidate AS c2 ON c2.node = tree.prereq
WHERE
c2.node IS NULL
UNION
-- Iteratively add any nodes from the candidate set which are
-- dependent upon a node we've already identified as a
-- survivor.
SELECT
candidate.node
FROM
candidate
JOIN tree ON candidate.node = tree.dependent
JOIN survivor ON survivor.node = tree.prereq
)
SELECT node FROM candidate
EXCEPT ALL
SELECT node FROM survivor
ORDER BY 1