我遇到以下问题:我正在尝试发现从源节点( node_s )到目标节点( node_t )的所有可能路径。
带有图形边缘的原始表的格式很简单: | node_x | node_y |强度| ,其中“node_x” - > “node_y”是边缘强度为“权重”的直接边缘。
我们的想法是,在探索路径的过程中,我们发现其子节点中的某个节点的目标是 node_t ,我们会记录此路径并停止探索此节点的路径,否则继续探索
简单的解决方案是使用PostgreSQL的递归CTE来构造图的传递闭包:
WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
(
--non-recurive term
SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
FROM best_path b
WHERE b.node_x = node_s --source_node
UNION
--recurive term
SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
)
SELECT *
FROM transitive_closure tc
WHERE tc.target = node_t --target_node
上面的代码将从源节点 node_s 中发现所有可能的路径。只有在传递闭包构造之后,我才能从源节点到目标节点选择所需的路径行(参见最后的SELECT语句)。
示例:
best_path表包含以下数据:
node_x | node_y | strength
--------+--------+----------
1 | 2 | 1
1 | 3 | 1
2 | 4 | 1
2 | 5 | 1
3 | 6 | 1
3 | 7 | 1
4 | 8 | 1
4 | 9 | 1
5 | 10 | 1
5 | 11 | 1
查询:
找到从源节点= 1到目标节点= 4
的路径结果:
source | target | strength | path_string
--------+--------+----------+------------
1 | 2 | 1 | 1.2.
1 | 3 | 1 | 1.3.
1 | 4 | 1 | 1.2.4.
1 | 5 | 1 | 1.2.5.
1 | 6 | 1 | 1.3.6.
1 | 7 | 1 | 1.3.7.
1 | 8 | 1 | 1.2.4.8.
1 | 9 | 1 | 1.2.4.9.
1 | 10 | 1 | 1.2.5.10.
1 | 11 | 1 | 1.2.5.11.
这不是我需要的。由于已经存在从节点2到节点4(目标)的直接边缘,因此我不需要路径1.2.5。,1.2.4.8。,1.2.4.9。,1.2.5.10。,1.2.5.11。,路径探索对于节点2,应该在发现从2到4的路径时停止。
总而言之,如果节点已经与目标节点有直接边缘,我不想发现节点的路径。这意味着在CTE的递归术语中,我希望有一些条件可以说下面的伪代码如下:
WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
(
--non-recurive term (as before)
SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
FROM best_path b
WHERE b.node_x = node_s --source_node
UNION
--recurive term
SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
AND b.node_y = node_t
--will select only rows with direct edge to target
UNION (second union is not allowed in CTE)
SELECT those rows which do not have direct edge to target
AND which parents did not contribute to constructing the query above.
i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
)
作为查询从源节点= 1到目标节点= 4的路径的结果,我希望有以下内容:
source | target | strength | path_string
--------+--------+----------+------------
1 | 2 | 1 | 1.2.
1 | 3 | 1 | 1.3.
1 | 4 | 1 | 1.2.4.
1 | 6 | 1 | 1.3.6.
1 | 7 | 1 | 1.3.7.
提前感谢您的帮助!
我已经尝试了很多方法:例如FROM / WHERE子句中的条件,尝试将CTE传递给函数,但没有成功。
任何建议都将受到赞赏。
我有自己的递归功能,可以实现我想要的功能,但是,它在大量数据上非常慢;而PostgreSQL的CTE显然已经过优化,所以我想深入研究一下。
答案 0 :(得分:2)
如果从底部开始,您可以更有效地搜索路径。从孩子们开始。如果你从父母开始,那就需要遍历所有的孩子;而如果您从孩子那里搜索,它只有一个父母,因此不会浪费时间找到源和目标之间的路径。
with recursive find_parent(source, target, recentness) as
(
select source, target, 0
from tbl
where target = 9
union all
select i.source, i.target, fp.recentness + 1
from tbl i
join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
select source, target, recentness, source || '.' || target
from find_parent
where recentness = (select max(recentness) from find_parent)
union
select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
from find_parent dd
join construct_path cp on dd.recentness = cp.recentness - 1
)
select source, target, path
from construct_path
order by recentness desc
输出:
SOURCE TARGET PATH
1 2 1.2
2 4 1.2.4
4 9 1.2.4.9
实时测试:http://www.sqlfiddle.com/#!1/13e6b/1
与此类似:How to get the parent given a child in SQL SERVER 2005
如果已经找到特定的(源),则优化,剪切递归到父级。
来源= 2
目标= 9
with recursive find_parent(source, target, recentness) as
(
select source, target, 0
from tbl
where target = 9
union all
select i.source, i.target, fp.recentness + 1
from tbl i
join find_parent fp on i.target = fp.source
-- despite the name, this target is another one's source
and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
select source, target, recentness, source || '.' || target
from find_parent
where recentness = (select max(recentness) from find_parent)
union
select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
from find_parent dd
join construct_path cp on dd.recentness = cp.recentness - 1
)
select source, target, path
from construct_path
order by recentness desc
输出:
SOURCE TARGET PATH
2 4 2.4
4 9 2.4.9
答案 1 :(得分:1)
用于测试的临时表:
CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path VALUES
(1, 2, 1)
,(1, 3, 1)
,(2, 4, 1)
,(2, 5, 1)
,(3, 6, 1)
,(3, 7, 1)
,(4, 8, 1)
,(4, 9, 1)
,(5, 10, 1)
,(5, 11, 1);
查询:
修改以容纳关于2 - 5
WITH RECURSIVE t AS ( -- for readability and convenience:
SELECT 1 AS node_s -- source_node
, 4 AS node_t -- target_node
)
, x AS (
SELECT node_x
FROM t, best_path
WHERE node_y = node_t
)
, a AS (
SELECT b.node_x
, b.node_y
, b.strength AS weight
, b.node_x || '.' || b.node_y || '.' AS path
FROM t, best_path b
LEFT JOIN x ON x.node_x = b.node_x
WHERE b.node_x = node_s
AND (x.node_x IS NULL -- no point with edge to target
OR b.node_y = node_t) -- except with target_node
UNION ALL
SELECT a.node_x
, b.node_y
, least(a.weight, b.strength)
, a.path || b.node_y || '.' AS path
FROM t, a
JOIN best_path AS b ON b.node_x = a.node_y
LEFT JOIN x ON x.node_x = b.node_x
WHERE a.node_y <> node_t -- arrived at target
AND a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
AND (x.node_x IS NULL -- no point with edge to target
OR b.node_y = node_t) -- except with target_node
)
TABLE a;
完全生成请求的结果。
我也将中断条件添加到初始查询中,因为我们只能在一步中到达目标。
这恰好与my answer to a previous, similar question非常相似。所有解释和链接均适用。这里的主要附加技巧是包括一个额外的CTE x
来收集节点......
有(a)直接边缘到目标。
在递归CTE的中断条件下重复使用。众所周知,无论关键字RECURSIVE
如何,您都可以在递归CTE之上添加其他(非递归)CTE。
答案 2 :(得分:1)
重新阅读OP的问题后,我想出了这个解决方案:
<强>来源
with recursive -- this denotes that at least one CTE is using recursion
inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
select bp.node_x, bp.node_x, bp.node_y, bp.node_x || '.' || bp.node_y,
max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
from best_path bp cross join inputs i
where bp.node_x = i.d_source -- filter
union
select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,
max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
from best_path bp cross join inputs i
join traverse_path tp on bp.node_x = tp.target
and not tp.bingo
)
select tp.*
from traverse_path tp cross join inputs i
-- if Postgresql has Oracle KEEP windowing,
-- we don't need to use WHERE clause here
where
not tp.bingo
or
(
tp.bingo and tp.target = d_target
)
上述WHERE子句可以缩短为:
WHERE not bingo or target = 4
<强>输出:强>
FILTER SOURCE TARGET PATH BINGO
1 1 2 1.2 0
1 1 3 1.3 0
1 2 4 1.2.4 1
1 3 6 1.3.6 0
1 3 7 1.3.7 0
实时测试:http://www.sqlfiddle.com/#!1/cdde6/55
这是Source = 2,Target = 5的结果:
FILTER SOURCE TARGET PATH BINGO
2 2 5 2.5 1
数据:强>
CREATE TABLE best_path
(node_x int, node_y int, strength int);
INSERT INTO best_path
(node_x, node_y, strength)
VALUES
(1, 2, 1),
(1, 3, 1),
(2, 4, 1),
(2, 5, 1),
(3, 6, 1),
(3, 7, 1),
(4, 8, 1),
(4, 9, 1),
(5, 10, 1),
(5, 11, 1);
答案 3 :(得分:0)
优化解决方案,最终结果不再有WHERE子句;虽然是Postgresql特定的解决方案,即我们使用DISTINCT ON来选择特定的行:
鉴于此数据:
CREATE TABLE best_path
(node_x int, node_y int, strength int);
INSERT INTO best_path
(node_x, node_y, strength)
VALUES
(1, 2, 1),
(1, 3, 1),
(2, 4, 1),
(2, 5, 1),
(3, 6, 1),
(3, 7, 1),
(4, 8, 1),
(4, 9, 1),
(5, 10, 1),
(5, 11, 1),
(5, 12, 1);
第一阶段的查询显示在幕后(source = 1,target = 11):
with recursive -- this denotes that at least one CTE is using recursion
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
select
bp.node_x, bp.node_x, bp.node_y, bp.node_x || '.' || bp.node_y
,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
,coalesce(
nullif(
max((bp.node_y = i.d_target)::int)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, 0)
,bp.node_y)
from best_path bp cross join inputs i
where bp.node_x = i.d_source -- filter
union
select
tp.filter, bp.node_x, bp.node_y, path || '.' || node_y
,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
,coalesce(
nullif(
max((bp.node_y = i.d_target)::int)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, 0)
,bp.node_y)
from best_path bp cross join inputs i
join traverse_path tp on bp.node_x = tp.target
and not tp.bingo
)
select tp.*
from traverse_path tp
source = 1,target = 11:http://www.sqlfiddle.com/#!1/db290/56
的输出FILTER SOURCE TARGET PATH BINGO DISTINCTER
1 1 2 1.2 0 2
1 1 3 1.3 0 3
1 2 4 1.2.4 0 4
1 2 5 1.2.5 0 5
1 3 6 1.3.6 0 6
1 3 7 1.3.7 0 7
1 4 8 1.2.4.8 0 8
1 4 9 1.2.4.9 0 9
1 5 11 1.2.5.11 1 1
1 5 10 1.2.5.10 1 1
1 5 12 1.2.5.12 1 1
正如我们所看到的,目标11是5的源中的第一行。这是怎么发生的?在我们的DISTINCTER列上,我们在MAX上使用ORDER BY,这是MAX窗口上的ORDER有意义的罕见情况之一。我们只是使用它来对结果进行排序。我尝试在查询结束时放置ORDER BY,但数据库抱怨它无法在CTE上使用ORDER。 CTE禁止放置ORDER BY子句。但是众所周知,我们可以影响OVER()
子句的排序,这就是我们的结果如何排序。在上面的结果中,在5的来源中,数字11在10和12之前排序,因为11是我们的目标。因此,当我们执行DISTINCT ON
(特定于Postgresql的功能)时,Postgres将拾取第一行,因此它将拾取目标11。
这是我们的最终查询,已经过优化,尽管是Postgresql特有的:
with recursive -- this denotes that at least one CTE is using recursion
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
select
distinct on (
bp.node_x,
coalesce(
nullif(
max((bp.node_y = i.d_target)::int)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, 0)
,bp.node_y)
)
bp.node_x, bp.node_x, bp.node_y, bp.node_x || '.' || bp.node_y
,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
from best_path bp cross join inputs i
where bp.node_x = i.d_source -- filter
union
select
distinct on (
bp.node_x,
coalesce(
nullif(
max((bp.node_y = i.d_target)::int)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, 0)
,bp.node_y)
)
tp.filter, bp.node_x, bp.node_y, path || '.' || node_y
,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
from best_path bp cross join inputs i
join traverse_path tp on bp.node_x = tp.target
and not tp.bingo
) select tp.* from traverse_path tp
source = 1,target = 11 http://www.sqlfiddle.com/#!1/db290/55
的输出FILTER SOURCE TARGET PATH BINGO
1 1 2 1.2 0
1 1 3 1.3 0
1 2 4 1.2.4 0
1 2 5 1.2.5 0
1 3 6 1.3.6 0
1 3 7 1.3.7 0
1 4 8 1.2.4.8 0
1 4 9 1.2.4.9 0
1 5 11 1.2.5.11 1
source = 1,target = 4:http://www.sqlfiddle.com/#!1/db290/53
的输出FILTER SOURCE TARGET PATH BINGO
1 1 2 1.2 0
1 1 3 1.3 0
1 2 4 1.2.4 1
1 3 6 1.3.6 0
1 3 7 1.3.7 0
源的输出= 2,目标= 5:http://www.sqlfiddle.com/#!1/db290/54
FILTER SOURCE TARGET PATH BINGO
2 2 5 2.5 1
另一种方法,而不是BINGO方法。使用continue_search作为继续travesal的逻辑。并使用每个聚合函数来确定是否需要继续遍历图形。
以下是它的工作原理:http://www.sqlfiddle.com/#!1/db290/84
最终查询:http://www.sqlfiddle.com/#!1/db290/85
有趣的是,每个人都非常喜欢英语:
对比这个:
,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool
使用EVERY:
,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
哪一个更容易阅读?
这是输出,说明了使用EVERY来促进DISTINCT ON的原则:
Source = 1,Target = 5. 请注意5 在属于同一来源的其他数字之前先排序,然后由DISTINCT ON选择。
FILTER SOURCE TARGET PATH CONTINUE_SEARCH DISTINCTER
1 1 2 1.2 1 2
1 1 3 1.3 1 3
1 2 5 1.2.5 0 0
1 2 4 1.2.4 0 0
1 3 6 1.3.6 1 6
1 3 7 1.3.7 1 7
以下是执行该原则的查询:
with recursive -- this denotes that at least one CTE is using recursion
inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
select
bp.node_x, bp.node_x, bp.node_y, concat(bp.node_x , '.' , bp.node_y )
,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
,coalesce(
cast(nullif(
every(bp.node_y <> i.d_target)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, true) as int)
,bp.node_y)
from best_path bp cross join inputs i
where bp.node_x = i.d_source -- filter
union
select
tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
,coalesce(
cast(nullif(
every(bp.node_y <> i.d_target)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, true) as int)
,bp.node_y)
from best_path bp cross join inputs i
join traverse_path tp on bp.node_x = tp.target
and tp.continue_search
)
select tp.*
from traverse_path tp
最终查询:
with recursive -- this denotes that at least one CTE is using recursion
inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
select
distinct on
(
bp.node_x
,coalesce(
cast(nullif(
every(bp.node_y <> i.d_target)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, true) as int)
,bp.node_y)
)
bp.node_x, bp.node_x, bp.node_y, concat(bp.node_x , '.' , bp.node_y )
,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
from best_path bp cross join inputs i
where bp.node_x = i.d_source -- filter
union
select
distinct on
(
bp.node_x
,coalesce(
cast(nullif(
every(bp.node_y <> i.d_target)
over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
, true) as int)
,bp.node_y)
)
tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
,every(bp.node_y <> i.d_target) over(partition by bp.node_x)
from best_path bp cross join inputs i
join traverse_path tp on bp.node_x = tp.target
and tp.continue_search
)
select tp.*
from traverse_path tp
输出:
FILTER SOURCE TARGET PATH CONTINUE_SEARCH
1 1 2 1.2 1
1 1 3 1.3 1
1 2 5 1.2.5 0
1 3 6 1.3.6 1
1 3 7 1.3.7 1