我遇到了一个问题,我可以为小型数据集解决这些问题,但是对于那些带有(可能)不干净数据的大型数据集会出现问题。
数据库是PostgreSQL中非循环(希望)图形的实现。有三个表
vertex_elements: id
edges: id, parent_id, child_id
element_associations: id, user_id, object_id (both are vertex elements, but it unconnected graphs)
我有一组 user_ids ,我从中导出 element_associations 和图中的起始 vertex_element ,我想找到所有孩子可以使用 user_ids 之一从 element_association 访问节点。如果某个节点或其祖先之一是 element_association 的候选 object_ids 之一,则认为该节点是可访问的。
图形的形状相对为三角形(很少有根节点,叶节点很多),从起始的顶点元素,我的策略如下:
当我想避免重复检查相同的祖先 vertex_elements 时,会出现问题。主要查询是向下遍历,使用一组候选 element_associations
检查每个后代的可访问性 WITH RECURSIVE edges_recursive(child_id, parent_id, matching_element_association_id) AS (
(
SELECT e1.child_id, e1.parent_id, ea.id
FROM edges e1
LEFT OUTER JOIN element_associations ea ON e1.child_id = ea.object_id
AND ea.id IN (?)
WHERE parent_id = ?
)
UNION
(
SELECT e2.child_id, e2.parent_id, ea.id
FROM edges e2
INNER JOIN assignments_recursive
ON edges_recursive.child_id = e2.parent_id
LEFT OUTER JOIN element_associations ea
ON edges_recursive.child_id = ea.object_id
AND ea.id IN (?)
WHERE edges_recursive.matching_element_association_id IS NULL
)
)
SELECT edges_recursive.child_id
FROM edges_recursive
WHERE edges_recursive.matching_element_association_id IS NOT NULL
但是,还有一个附加的递归子查询,用于检查LEFT OUTER JOIN element_associations中的每个 vertex_element&#39> ,它们看起来像
ea.id IN (
WITH RECURSIVE parent_edges_recursive(child_id, parent_id, matching_element_association_id) AS (
(
SELECT edges.child_id, edges.parent_id, ea.id
FROM edges
LEFT OUTER JOIN element_associations ea
ON ea.id IN (?) AND edges.parent_id = ea.object_id
WHERE edges.child_id = e1.parent_id AND edges.parent_id != e1.parent_id
)
UNION
(
SELECT edges.child_id, edges.parent_id. ea.id
FROM edges
JOIN parent_edges_recursive
ON parent_edges_recursive.parent_id = edges.child_id
LEFT OUTER JOIN element_associations ea
ON ea.id IN (?) AND edges.parent_id = ea.object_id
WHERE parent_edges_recursive.matching_element_association_id IS NULL
)
SELECT parent_edges_recursive.matching_element_association_id
FROM parent_edges_recursive
WHERE parent_edges_recursive.matching_element_association_id IS NOT NULL
LIMIT 1
)
)
这个问题是,子查询倾向于避免遍历同一个父顶点两次;但是,我们无法保证当我们通过后代遍历图表时,我们不会翻新先前评估过的祖先。对于小型数据集,这很好,性能好;然而,它是可笑的不可扩展的,并且非常不耐受循环。
我需要做的是保留有关我已在子查询之间遍历的父 vertex_elements 的信息,以避免重新读取步骤;但是,我仍然坚持如何在单一查询中执行此操作。
答案 0 :(得分:0)
我需要做的是保留有关父母的信息 vertex_elements我已经在子查询之间遍历了所以我 避免翻新步骤;
不详细研究您的查询:您可以通过收集数组中的ID来实现。代码示例: