WITH RECURSIVE查询选择最长路径

时间:2015-04-22 17:22:44

标签: sql postgresql common-table-expression recursive-query directed-graph

我是PostgreSQL中WITH RECURSIVE的新手。我有一个合理标准的递归查询,它遵循邻接列表。如果我有,例如:

1 -> 2
2 -> 3
3 -> 4
3 -> 5
5 -> 6

它产生:

1
1,2
1,2,3
1,2,3,4
1,2,3,5
1,2,3,5,6

我想要的是:

1,2,3,4
1,2,3,5,6

但我无法在Postgres中看到如何做到这一点。这似乎是选择最长的路径"或"选择未包含在另一条路径中的路径"。我可能会看到如何通过连接本身来做到这一点,但这看起来效率很低。

示例查询是:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
   SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
   FROM graph g
  UNION ALL
   SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
   FROM graph g, search_graph sg
   WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

3 个答案:

答案 0 :(得分:2)

只需将extra子句添加到最终查询中,例如:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
   SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
    FROM graph g
    -- BTW: you should add a START-CONDITION here, like:
    -- WHERE g.id = 1
    -- or even (to find ALL linked lists):
    -- WHERE NOT EXISTS ( SELECT 13
          -- FROM graph nx
          -- WHERE nx.link = g.id
          -- )
  UNION ALL
     SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
    FROM graph g, search_graph sg
    WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph sg
WHERE NOT EXISTS ( -- <<-- extra condition
   SELECT 42 FROM graph nx
   WHERE nx.id = sg.link
    );

请注意:

  • not exists(...) -clause尝试将递归联合的 leg 相同的记录连接起来。
  • 所以:他们互相排斥。
  • 如果 存在,则应该追加到&#34;列表&#34;通过递归查询。

答案 1 :(得分:2)

您已经拥有cycle触手可及的解决方案,只需在最后添加一个谓词。

但是将您的休息条件调整一个级别,目前您正在追加一个节点太多:

WITH RECURSIVE search AS (
   SELECT id, link, data, ARRAY[g.id] AS path, (link = id) AS cycle
   FROM   graph g
   WHERE  NOT EXISTS (
      SELECT 1
      FROM   graph
      WHERE  link = g.id
      )

   UNION ALL
   SELECT g.id, g.link, g.data, s.path || g.id, g.link = ANY(s.path)
   FROM   search s
   JOIN   graph g ON g.id = s.link
   WHERE  NOT s.cycle
   )
SELECT *
FROM   search
WHERE cycle;
-- WHERE cycle IS NOT FALSE;  -- alternative if link can be NULL
  • 还包括mentioned by @wildplasser等启动条件。​​

  • cycle的初始条件为(link = id)以捕获快捷方式周期。如果你有CHECK约束要禁止在你的表中使用,那就没有必要了。

  • 具体实施取决于缺失的细节。

  • 这是假设所有图表都以一个周期或link IS NULL终止,并且在同一个表中存在从linkid的FK约束。 确切的实现取决于缺少的细节。如果link实际上不是链接(没有参照完整性),则需要进行调整......

答案 2 :(得分:0)

我不确定这是否应视为丑陋的联接解决方案。

WITH recursive graph (child, parent) AS (
    SELECT 2, 1
    UNION
    SELECT 3, 2
    UNION
    SELECT 4, 2
    UNION
    SELECT 6, 5
    UNION
    SELECT 7, 6
    UNION
    SELECT 6, 7
),
paths (start, node, depth, path, has_cycle, terminated) AS (
    SELECT
        ARRAY[g1.parent],
        false,
        false
    FROM graph g1
    WHERE true
        AND NOT EXISTS (SELECT 1 FROM graph g2 WHERE g1.parent = g2.child)
    UNION ALL
    SELECT
        p.path || g.child,
        g.child = ANY(p.path),
        g.parent is null AS terminated
    FROM paths p
    LEFT OUTER JOIN graph g ON g.parent = p.node
    WHERE NOT has_cycle
)
SELECT * from path WHERE terminated
;

因此,诀窍是通过使用terminated使用LEFT OUTER JOIN列,然后仅选择终止的路径。