PostgreSQL将数据从递归CTE传递到函数

时间:2012-05-01 01:40:41

标签: postgresql graph transitive-closure-table recursive-cte

我遇到以下问题:我正在尝试发现从源节点( node_s )到目标节点( node_t )的所有可能路径。

带有图形边缘的原始表的格式很简单: | node_x | node_y |强度| ,其中“node_x” - > “node_y”是边缘强度为“权重”的直接边缘。

我们的想法是,在探索路径的过程中,我们发现其子节点中的某个节点的目标是 node_t ,我们会记录此路径并停止探索此节点的路径,否则继续探索

简单的解决方案是使用PostgreSQL的递归CTE来构造图的传递闭包:

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
  )
  SELECT * 
  FROM transitive_closure tc
  WHERE tc.target = node_t --target_node

上面的代码将从源节点 node_s 中发现所有可能的路径。只有在传递闭包构造之后,我才能从源节点到目标节点选择所需的路径行(参见最后的SELECT语句)。

示例:

best_path表包含以下数据:

 node_x | node_y | strength 
--------+--------+----------
      1 |      2 |        1
      1 |      3 |        1
      2 |      4 |        1
      2 |      5 |        1
      3 |      6 |        1
      3 |      7 |        1
      4 |      8 |        1
      4 |      9 |        1
      5 |     10 |        1
      5 |     11 |        1

查询:

找到从源节点= 1到目标节点= 4

的路径

结果:

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      5 |        1 | 1.2.5.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.
      1 |      8 |        1 | 1.2.4.8.
      1 |      9 |        1 | 1.2.4.9.
      1 |     10 |        1 | 1.2.5.10.
      1 |     11 |        1 | 1.2.5.11.

这不是我需要的。由于已经存在从节点2到节点4(目标)的直接边缘,因此我不需要路径1.2.5。,1.2.4.8。,1.2.4.9。,1.2.5.10。,1.2.5.11。,路径探索对于节点2,应该在发现从2到4的路径时停止。

总而言之,如果节点已经与目标节点有直接边缘,我不想发现节点的路径。这意味着在CTE的递归术语中,我希望有一些条件可以说下面的伪代码如下:

  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term (as before)
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
     AND b.node_y = node_t 
     --will select only rows with direct edge to target

   UNION (second union is not allowed in CTE)

   SELECT those rows which do not have direct edge to target 
   AND which parents did not contribute to constructing the query above. 
   i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
  )

作为查询从源节点= 1到目标节点= 4的路径的结果,我希望有以下内容:

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.

提前感谢您的帮助!

我已经尝试了很多方法:例如FROM / WHERE子句中的条件,尝试将CTE传递给函数,但没有成功。

任何建议都将受到赞赏。

我有自己的递归功能,可以实现我想要的功能,但是,它在大量数据上非常慢;而PostgreSQL的CTE显然已经过优化,所以我想深入研究一下。

4 个答案:

答案 0 :(得分:2)

如果从底部开始,您可以更有效地搜索路径。从孩子们开始。如果你从父母开始,那就需要遍历所有的孩子;而如果您从孩子那里搜索,它只有一个父母,因此不会浪费时间找到源和目标之间的路径。

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
  select source, target, recentness, source || '.' || target
  from find_parent 
  where recentness = (select max(recentness) from find_parent)

  union

  select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
  from find_parent dd
  join construct_path cp on dd.recentness = cp.recentness - 1  
)
select source, target, path 
from construct_path
order by recentness desc

输出:

SOURCE   TARGET   PATH
1        2        1.2
2        4        1.2.4
4        9        1.2.4.9

实时测试:http://www.sqlfiddle.com/#!1/13e6b/1

与此类似:How to get the parent given a child in SQL SERVER 2005


如果已经找到特定的(源),则优化,剪切递归到父级。

来源= 2

目标= 9

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)

    union

    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc

输出:

SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9

实时测试:http://www.sqlfiddle.com/#!1/13e6b/16

答案 1 :(得分:1)

用于测试的临时表:

CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path  VALUES
 (1, 2,  1)
,(1, 3,  1)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  1)
,(3, 7,  1)
,(4, 8,  1)
,(4, 9,  1)
,(5, 10, 1)
,(5, 11, 1);

查询:
修改以容纳关于2 - 5

的评论
WITH RECURSIVE t AS (    -- for readability and convenience:
    SELECT 1 AS node_s   -- source_node
         , 4 AS node_t   -- target_node
    )
    , x AS (
    SELECT node_x
    FROM   t, best_path
    WHERE  node_y = node_t
    )
    , a AS  ( 
    SELECT b.node_x
         , b.node_y
         , b.strength AS weight
         , b.node_x || '.' || b.node_y || '.' AS path
    FROM   t, best_path b
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  b.node_x = node_s
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node

    UNION ALL
    SELECT a.node_x
         , b.node_y
         , least(a.weight, b.strength)
         , a.path || b.node_y || '.' AS path
    FROM   t, a
    JOIN   best_path AS b ON b.node_x = a.node_y
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  a.node_y <> node_t                   -- arrived at target
    AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node
    )
TABLE a;

完全生成请求的结果。

我也将中断条件添加到初始查询中,因为我们只能在一步中到达目标。

这恰好与my answer to a previous, similar question非常相似。所有解释和链接均适用。这里的主要附加技巧是包括一个额外的CTE x来收集节点......

  

有(a)直接边缘到目标。

在递归CTE的中断条件下重复使用。众所周知,无论关键字RECURSIVE如何,您都可以在递归CTE之上添加其他(非递归)CTE。

答案 2 :(得分:1)

重新阅读OP的问题后,我想出了这个解决方案:

<强>来源

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
  select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,

    max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter

  union

  select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   

      max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   

  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp cross join inputs i    
-- if Postgresql has Oracle KEEP windowing, 
-- we don't need to use WHERE clause here     
where 
  not tp.bingo   
  or
  (
    tp.bingo and tp.target = d_target
  )

上述WHERE子句可以缩短为:

  WHERE not bingo or target = 4

<强>输出:

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

实时测试:http://www.sqlfiddle.com/#!1/cdde6/55


这是Source = 2,Target = 5的结果:

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

数据:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1);

答案 3 :(得分:0)

优化解决方案,最终结果不再有WHERE子句;虽然是Postgresql特定的解决方案,即我们使用DISTINCT ON来选择特定的行:

鉴于此数据:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1),
    (5, 12, 1);

第一阶段的查询显示在幕后(source = 1,target = 11):

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp 

source = 1,target = 11:http://www.sqlfiddle.com/#!1/db290/56

的输出
FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
1        1        2        1.2      0        2
1        1        3        1.3      0        3
1        2        4        1.2.4    0        4
1        2        5        1.2.5    0        5
1        3        6        1.3.6    0        6
1        3        7        1.3.7    0        7
1        4        8        1.2.4.8  0        8
1        4        9        1.2.4.9  0        9
1        5        11       1.2.5.11 1        1
1        5        10       1.2.5.10 1        1
1        5        12       1.2.5.12 1        1

正如我们所看到的,目标11是5的源中的第一行。这是怎么发生的?在我们的DISTINCTER列上,我们在MAX上使用ORDER BY,这是MAX窗口上的ORDER有意义的罕见情况之一。我们只是使用它来对结果进行排序。我尝试在查询结束时放置ORDER BY,但数据库抱怨它无法在CTE上使用ORDER。 CTE禁止放置ORDER BY子句。但是众所周知,我们可以影响OVER()子句的排序,这就是我们的结果如何排序。在上面的结果中,在5的来源中,数字11在10和12之前排序,因为11是我们的目标。因此,当我们执行DISTINCT ON(特定于Postgresql的功能)时,Postgres将拾取第一行,因此它将拾取目标11。


这是我们的最终查询,已经过优化,尽管是Postgresql特有的:

with recursive -- this denotes that at least one CTE is using recursion    
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
  select 
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )          
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter
  union
  select       
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )    
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
) select tp.* from traverse_path tp

source = 1,target = 11 http://www.sqlfiddle.com/#!1/db290/55

的输出
FILTER   SOURCE   TARGET   PATH     BINGO
1        1        2        1.2      0
1        1        3        1.3      0
1        2        4        1.2.4    0
1        2        5        1.2.5    0
1        3        6        1.3.6    0
1        3        7        1.3.7    0
1        4        8        1.2.4.8  0
1        4        9        1.2.4.9  0
1        5        11       1.2.5.11 1

source = 1,target = 4:http://www.sqlfiddle.com/#!1/db290/53

的输出
FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

源的输出= 2,目标= 5:http://www.sqlfiddle.com/#!1/db290/54

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1


另一种方法,而不是BINGO方法。使用continue_search作为继续travesal的逻辑。并使用每个聚合函数来确定是否需要继续遍历图形。

以下是它的工作原理:http://www.sqlfiddle.com/#!1/db290/84

最终查询:http://www.sqlfiddle.com/#!1/db290/85

有趣的是,每个人都非常喜欢英语:

对比这个:

,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

使用EVERY:

,every(bp.node_y <> i.d_target) over(partition by bp.node_x)

哪一个更容易阅读?

这是输出,说明了使用EVERY来促进DISTINCT ON的原则:

Source = 1,Target = 5. 请注意5 在属于同一来源的其他数字之前先排序,然后由DISTINCT ON选择。

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
1         1         2         1.2       1                   2
1         1         3         1.3       1                   3
1         2         5         1.2.5     0                   0
1         2         4         1.2.4     0                   0
1         3         6         1.3.6     1                   6
1         3         7         1.3.7     1                   7

以下是执行该原则的查询:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      ,coalesce(        
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
      ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

最终查询:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
  select                      
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )    
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select   
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )  
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

输出:

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
1         1         2         1.2       1
1         1         3         1.3       1
1         2         5         1.2.5     0
1         3         6         1.3.6     1
1         3         7         1.3.7     1