我使用PostgreSQL 9.1来查询分层树形结构数据,包括与节点连接的边(或元素)。数据实际上是用于流网络,但我已经将问题抽象为简单的数据类型。考虑示例tree
表。每条边都有长度和面积属性,用于确定网络中的一些有用指标。
CREATE TEMP TABLE tree (
edge text PRIMARY KEY,
from_node integer UNIQUE NOT NULL, -- can also act as PK
to_node integer REFERENCES tree (from_node),
mode character varying(5), -- redundant, but illustrative
length numeric NOT NULL,
area numeric NOT NULL,
fwd_path text[], -- optional ordered sequence, useful for debugging
fwd_search_depth integer,
fwd_length numeric,
rev_path text[], -- optional unordered set, useful for debugging
rev_search_depth integer,
rev_length numeric,
rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
('A', 1, 4, 'start', 1.1, 0.9),
('B', 2, 4, 'start', 1.2, 1.3),
('C', 3, 5, 'start', 1.8, 2.4),
('D', 4, 5, NULL, 1.2, 1.3),
('E', 5, NULL, 'end', 1.1, 0.9);
下面可以说明,其中由A-E表示的边缘与节点1-5连接。 NULL to_node
(Ø)表示结束节点。 from_node
始终是唯一的,因此它可以充当PK。如果这个网络像drainage basin一样流动,它将从上到下,其中起始支流边缘为A,B,C,结束流出边缘为E.
documentation for WITH
提供了一个很好的例子,说明如何在递归查询中使用搜索图。因此,为了获得“转发”信息,查询从最后开始,然后向后工作:
WITH RECURSIVE search_graph AS (
-- Begin at ending nodes
SELECT E.from_node, 1 AS search_depth, E.length
, ARRAY[E.edge] AS path -- optional
FROM tree E WHERE E.to_node IS NULL
UNION ALL
-- Accumulate each edge, working backwards (upstream)
SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
, o.edge|| sg.path -- optional
FROM tree o, search_graph sg
WHERE o.to_node = sg.from_node
)
UPDATE tree SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;
edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1
以上内容很有意义,适用于大型网络。例如,我可以看到边B
是末端的3条边,前向路径是{B,D,E}
,从尖端到结尾的总长度为3.5。
但是,我无法找到构建反向查询的好方法。也就是说,从每个边缘,累积的“上游”边缘,长度和区域是什么。使用WITH RECURSIVE
,我所拥有的最好的是:
WITH RECURSIVE search_graph AS (
-- Begin at starting nodes
SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
, ARRAY[S.edge] AS path -- optional
FROM tree S WHERE from_node IN (
-- Starting nodes have a from_node without any to_node
SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)
UNION ALL
-- Accumulate edges, working forwards
SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
, c.edge || sg.path -- optional
FROM tree c, search_graph sg
WHERE c.from_node = sg.to_node
)
UPDATE tree SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3
我想将聚合构建到递归查询的第二项中,因为每个下游边连接到1个或多个上游边,但是aggregates are not allowed with recursive queries。另外,我知道加入很麻烦,因为with recursive
结果有edge
的多个连接条件。
反向/反向查询的预期结果是:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8
如何编写此更新查询?
请注意,我最终更关心的是累积精确的长度和面积总数,以及路径属性是否用于调试。在我的实际案例中,前向路径高达几百个,我预计大型和复杂集水区的反向路径数以万计。
答案 0 :(得分:6)
更新2: 我重写了原始的递归查询,以便所有累积/聚合都在递归部分之外完成。它应该比此答案的先前版本表现更好。 对于类似的问题,这与@a_horse_with_no_name的answer非常相似。
WITH
RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS
(
SELECT edge, from_node, to_node, length, area, from_node AS "start_node"
FROM tree
UNION ALL
SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node
FROM tree o
JOIN search_graph p ON p.from_node = o.to_node
)
SELECT array_agg(edge) AS "edges"
-- ,array_agg(from_node) AS "nodes"
,count(edge) AS "edge_count"
,sum(length) AS "length_sum"
,sum(area) AS "area_sum"
FROM search_graph
GROUP BY start_node
ORDER BY start_node
;
结果如预期:
start_node | edges | edge_count | length_sum | area_sum
------------+-------------+------------+------------+------------
1 | {A} | 1 | 1.1 | 0.9
2 | {B} | 1 | 1.2 | 1.3
3 | {C} | 1 | 1.8 | 2.4
4 | {D,B,A} | 3 | 3.5 | 3.5
5 | {E,D,C,B,A} | 5 | 6.4 | 6.8