使用树和表格中的id和后代连接两个表

时间:2013-05-24 18:19:55

标签: sql database postgresql plpgsql postgresql-9.2

我有以下表格:

CREATE TABLE element (
  element_id serial PRIMARY KEY,
  local_id integer,
  name varchar,
  CONSTRAINT fk_element_local_id FOREIGN KEY (local_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE local (
  local_id serial PRIMARY KEY,
  parent_id integer,
  name varchar,
  CONSTRAINT fk_local_parent_id_local_id FOREIGN KEY (parent_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE SET NULL
);

CREATE TABLE category (
  category_id serial PRIMARY KEY,
  name varchar
);

CREATE TABLE action (
  action_id serial PRIMARY KEY,
  local_id integer,
  category_id integer,
  CONSTRAINT fk_action_local_id FOREIGN KEY (local_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_action_element_id FOREIGN KEY (element_id)
      REFERENCES element (element_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);

我想从动作中选择所有元素。如果元素的局部是动作本地的后代,它也应该出现 例如:

local

|local_id | parent_id | name |
|---------+-----------+------|
|1        |NULL       |A     |
|2        |1          |B     |
|3        |1          |C     |
|4        |3          |D     |
|5        |NULL       |E     |
|6        |5          |F     |
|_________|___________|______|

category

| category_id | name |
|-------------+------|
|1            |A     |
|2            |B     |
|2            |C     |
|_____________|______|

element

|element_id | local_id | name | category_id |
|-----------+----------+------+-------------|
|1          |1         |A     | 1           |
|2          |2         |B     | 2           |
|3          |2         |C     | 1           |
|4          |4         |D     | 2           |
|5          |5         |E     | 2           |
|6          |6         |F     | 1           |
|7          |6         |G     | 1           |
|___________|__________|______|_____________|

action

|action_id | local_id | category_id |
|----------+----------+-------------|
| 1        | 1        | 2           |
| 2        | 3        | 1           |
| 3        | 5        | 1           |
| 4        | 6        | 1           |
|__________|__________|_____________|

我想要的查询结果:

CASE: action_id = 1
return: element_id: 2,4

CASE: action_id = 2
return: element_id: null

CASE: action_id = 3
return: element_id: 6,7

我已经创建了一个函数,它返回包括实际节点在内的所有后代,但由于在调用函数数千次时的性能,我遇到了困难。 我的功能如下:

CREATE OR REPLACE FUNCTION fn_local_get_childs(_parent_id integer)
  RETURNS SETOF integer AS
$BODY$
DECLARE
   r integer;
BEGIN
   FOR r IN SELECT local_id FROM local WHERE local_id IN ( 
      (WITH RECURSIVE parent AS
      (
         SELECT local_id , parent_id  from local WHERE local_id = _parent_id
         UNION ALL 
         SELECT t.local_id , t.parent_id FROM parent
         INNER JOIN local t ON parent.local_id =  t.parent_id
      )
      SELECT local_id FROM  parent
      ) 
   )
   LOOP
      RETURN NEXT r;
   END LOOP;
   RETURN;        
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;

我的超慢查询看起来像这样:

select e.element_id, a.action_id
from action a
join element e on (
                   e.local_id=any(select fn_local_get_childs(a.local_id)) AND 
                   e.category_id=a.category_id)

有没有办法在单个查询中组合函数中使用的递归?

1 个答案:

答案 0 :(得分:1)

整合查询

在几个地方改进逻辑,您可以将整个操作集成到一个查询中。包装到SQL函数是可选的:

CREATE OR REPLACE FUNCTION f_elems(_action_id integer)
  RETURNS SETOF integer AS
$func$
   WITH RECURSIVE l AS (
      SELECT a.category_id, l.local_id
      FROM   action a
      JOIN   local  l USING (local_id)
      WHERE  a.action_id = $1

      UNION ALL 
      SELECT l.category_id, c.local_id
      FROM   l
      JOIN   local c ON c.parent_id = l.local_id  -- c for "child"
      )
   SELECT e.element_id
   FROM   l
   JOIN   element e USING (category_id, local_id);
$func$  LANGUAGE sql STABLE;

COMMENT ON FUNCTION f_elems(integer) IS
'Retrieves all element_id for same and child-locals of a given action_id';

致电:

SELECT * FROM f_elem(3);

element_id
-----------
6
7

-> SQLfiddle.

由于多种原因,这应该基本更快,最明显的原因是:

  • 在plpgsql中用纯SQL代替慢循环。
  • 缩小递归查询的起始集。
  • 删除不必要的和众所周知的慢IN构造。

提示:我正在使用SELECT * FROM ...而不是SELECT,即使该行有一列来获取我在{1}}参数(element_id)中声明的列名称功能标题。

更快,但

指数

  1. OUT上的索引由主键提供。

  2. 但您可能错过了action.action_id上的索引。在此过程中,将覆盖多列索引(Postgres 9.2+)作为第一个元素,local.parent_id作为第二个元素。parent_id作为第二个元素。如果表local_id很大,这应该会有很大帮助。没有那么多或根本没有一张小桌子:

    local

    为什么呢? Explanation under this related question on dba.SE

  3. 最后,表CREATE INDEX l_mult_idx ON local(parent_id, local_id) 上的multi-column index应该可以提供更多帮助:

    element

    第三列CREATE INDEX e_mult_idx ON element (category_id, local_id, element_id) 仅用于使其成为覆盖索引。如果您的查询从表element_id检索更多列,则可能需要向索引添加更多列或删除element。要么让它更快。

  4. 物化视图

    如果您的表格相当稳定(很少或没有更新),物化视图提供共享相同类别的所有对element_id的预先计算的集合将使此闪电-fast 。将(action_id, element_id)(按此顺序)作为主键 您必须在每次写入操作(使用触发器)后更新此物化视图。那将是一个全新的问题。