在Postgresql中使用couting生成嵌套的json

时间:2016-03-12 10:28:30

标签: json postgresql

我创建了一个简单的数据库(在最新的稳定postgresql中),如下所示:

create table table_a(id int primary key not null, name char(10));
create table table_b(id int primary key not null, name char(10), parent_a_id int);
create table table_c(id int primary key not null, name char(10), parent_a_id int, parent_b_id int, parent_c_id int, c_number int);
create table table_d(id int primary key not null, name char(10), parent_c_id int, d_number int);

有一些像这样的示例数据:

insert into table_a(id, name) values(1, "a");

insert into table_b(id, name, parent_a_id) values(1, "b", 1);

insert into table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(1, "c1", 1, 1, null, 1);
insert into table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(2, "c1.1", 1, 1, 1, 5);
insert into table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(3, "c1.1.1", 1, 1, 2, 2);
insert into table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(4, "c1.2", 1, 1, 1, 8);
insert into table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(5, "c2", 1, 1, null, 4);

insert into table_d(id, name, parent_c_id, d_number) values(1, "c1_d1", 1, 5);
insert into table_d(id, name, parent_c_id, d_number) values(2, "c1.1_d1", 2, 6);
insert into table_d(id, name, parent_c_id, d_number) values(3, "c1.1_d2", 2, 1);
insert into table_d(id, name, parent_c_id, d_number) values(4, "c1.1.1_d1", 3, 2);
insert into table_d(id, name, parent_c_id, d_number) values(5, "c2_d1", 5, 4);
insert into table_d(id, name, parent_c_id, d_number) values(6, "c2_d2", 5, 3);
insert into table_d(id, name, parent_c_id, d_number) values(7, "c2_d3", 5, 7);

现在我想像这样生成json:http://codebeautify.org/jsonviewer/cb9bc2a1

有关系规则:

  1. table_a有很多table_b
  2. table_b有一个table_a并且有很多table_c(仅在table_c_id为null时选择)
  3. table_c有一个table_a,有一个table_b,有很多table_c(子),有一个table_c(父)
  4. 和couting rules:

    1. table_c有d_numbers_sum(table_d中d_number的总和和table_c关系中d_numbers_sum的总和)
    2. table_b具有d_numbers_sum(table_c关系中的d_numbers_sum的总和)
    3. table_a具有d_numbers_sum(table_b关系中的d_numbers_sum的总和)
    4. table_c有real_c_number(如果有child_c则是table_c关系中real_c_number的总和,否则为c_number)
    5. table_b有real_c_number_sum(table_c关系中real_c_number的总和)
    6. table_a有real_c_number_sum(table_b关系中real_c_number_sum的总和)
    7. 是否可以在纯postgresql代码中使用该规则生成该JSON?

      是否可以为此生成shourtcat函数:

      select * from my_shourtcat where id = ?;
      

      或whitout id(生成json数组):

      select * from my_shourtcat;
      

      你能告诉我一个描述的例子(如何生成嵌套的json和couting),所以我可以使用类似的关系,但这些在我的应用程序中更复杂?

      修改

      我写了一些有趣的东西,但它不是100%嵌套的哈希 - 这里所有叶子都有自己的树,结果是这些树的数组我需要深度合并该数组以创建独特的树数组:

      with recursive j as (
          SELECT c.*, json '[]' children -- at max level, there are only leaves
          FROM test.table_c c
          WHERE (select count(1) from test.table_c where parent_c_id = c.id) = 0
        UNION ALL
          -- a little hack, because PostgreSQL doesn't like aggregated recursive terms
          SELECT (c).*, array_to_json(array_agg(j)) children
          FROM (
            SELECT c, j
            FROM j
            JOIN test.table_c c ON j.parent_c_id = c.id
          ) v
          GROUP BY v.c
      )
      SELECT json_agg(row_to_json(j)) json_tree FROM j WHERE parent_c_id is null;
      

1 个答案:

答案 0 :(得分:4)

答案由两部分组成。首先要构建一个基本的json结构,然后从table_c中的自引用列构建嵌套的json对象。

UPDATE :我将example / part 2重写为纯sql解决方案,并将该代码添加为示例3。 我还添加了一个plsql函数,它封装了几乎所有代码,它将视图的名称作为输入来生成嵌套的json。见例4.

所有代码都需要Postgres 9.5。

第一个代码设置了一个json对象,其中包含大多数连接,但table_c中的嵌套子节点除外。计数部分大多被遗漏。

在第二个代码示例中,我在纯plpgsql中编写了一个“merge”函数,它应该解决嵌套的json问题。这个解决方案只需要PG9.5而不需要扩展,因为内置了plpgsql。

作为替代方案,我发现另外一个solution that requires plv8 installed在javascript中进行深度合并 )。

在纯sql中创建嵌套的json并不容易,其中的挑战是合并我们可以从递归CTE获得的单独的json树。

代码示例1

以视图的形式创建查询可以轻松地重用查询,从table_a返回所有对象的json数组,或者只返回一个具有给定id的对象。

我对数据模型和数据做了一些小改动。自包含示例的代码如下:

--TABLES
DROP SCHEMA IF EXISTS TEST CASCADE;
CREATE SCHEMA test;

-- Using text instead of char(10), to avoid padding. For most     databases text is the best choice. 
-- Postgresql uses the same implementation the hood (char vs text)
-- Source: https://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

create table test.table_a(id int primary key not null, name text);
create table test.table_b(id int primary key not null, name text, parent_a_id int);
create table test.table_c(id int primary key not null, name text, parent_a_id int, parent_b_id int, parent_c_id int, c_number int);
create table test.table_d(id int primary key not null, name text, parent_c_id int, d_number int);

--DATA
insert into test.table_a(id, name) values(1, 'a');

-- Changed: parent_a_id=1 (instead of null)
insert into test.table_b(id, name, parent_a_id) values(1, 'b', 1);

insert into test.table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(1, 'c1', 1, 1, null, 1);
insert into test.table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(2, 'c1.1', 1, 1, 1, 5);
insert into test.table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(3, 'c1.1.1', 1, 1, 2, 2);
insert into test.table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(4, 'c1.2', 1, 1, 1, 8);
insert into test.table_c(id, name, parent_a_id, parent_b_id, parent_c_id, c_number) values(5, 'c2', 1, 1, null, 4);

insert into test.table_d(id, name, parent_c_id, d_number) values(1, 'c1_d1', 1, 5);
insert into test.table_d(id, name, parent_c_id, d_number) values(2, 'c1.1_d1', 2, 6);
insert into test.table_d(id, name, parent_c_id, d_number) values(3, 'c1.1_d2', 2, 1);
insert into test.table_d(id, name, parent_c_id, d_number) values(4, 'c1.1.1_d1', 3, 2);
insert into test.table_d(id, name, parent_c_id, d_number) values(5, 'c2_d1', 5, 4);
insert into test.table_d(id, name, parent_c_id, d_number) values(6,'c2_d2', 5, 3);
insert into test.table_d(id, name, parent_c_id, d_number) values(7, 'c2_d3', 5, 7);


CREATE OR REPLACE VIEW json_objects AS
--Root object
SELECT ta.id, json_build_object(
    'id', ta.id,
    'name', ta.name,
    'd_numbers_sum', (SELECT sum(d_number) FROM test.table_d),
    'real_c_number_sum', null,
    'children_b', (

        -- table_b
        SELECT json_agg(json_build_object(
            'id', tb.id,
            'name', tb.name,
            'd_numbers_sum', null,
            'real_c_number_sum', null,
            'children_c', (

                -- table_c
                SELECT json_agg(json_build_object(
                   'id', tc.id,
                   'name', tc.name,
                   'd_numbers_sum', null,
                   'real_c_number_sum', null,
                   'children_d', (

                        -- table_d
                        SELECT json_agg(json_build_object(
                           'id', td.id,
                           'name', td.name,
                           'd_numbers_sum', null,
                           'real_c_number_sum', null
                        ))
                        FROM test.table_d td
                        WHERE td.parent_c_id = tc.id
                    )
                ))
                FROM test.table_c tc
                WHERE tc.parent_b_id = tb.id
            )
        ))
        FROM test.table_b tb
        WHERE tb.parent_a_id = ta.id
    )
) AS object
FROM test.table_a ta


-- Return json array of all objects
SELECT json_agg(object) FROM json_objects;

-- Return only json object with given id
SELECT object FROM json_objects WHERE id = 1

代码示例2

这里我们映射table_c中的数据,以便我们可以直接将其插入到文档中的递归CTE中,以便于阅读和教育。 然后准备数据作为“合并”功能的输入。为简单起见,我只是将行聚合成一个大的json对象。表现应该没问题。 我们可以选择在第三个函数参数中获取父对象,或仅将其子对象作为(json)数组。

在示例的最后几行的最后一个查询中指定了获取子节点的节点。此查询可用于我们需要table_c节点的子节点的所有位置。 我在一个更复杂的例子上做了测试,看起来我整理了大部分粗糙的边缘。

CTE的三个部分(graph,search_graph和filtered_graph)可以重构为一个以提高性能,因为CTE是数据库规划器的优化范围,但我保留了这个版本以便于阅读和调试。

此示例使用jsonb而不是json,see the documentation。 在这里使用jsonb的原因是每次我们在函数中操作它时都不必重新解析json。当函数完成后,结果将被转换回json,因此可以直接插入到示例1中的代码中。

--DROP VIEW test.tree_path_list_v  CASCADE;
CREATE OR REPLACE VIEW test.tree_path_list_v AS 
WITH RECURSIVE
    -- Map the source data so we can use it directly in a recursive query from the documentation:
    graph AS
    (
        SELECT id AS id, parent_c_id AS link, name, jsonb_build_object('id', id, 'name', name, 'parent_c_id', parent_c_id, 'parent_a_id', parent_a_id, 'parent_b_id', parent_b_id) AS data
        FROM test.table_c
    ),
    -- Recursive query from documentation.
    -- http://www.postgresql.org/docs/current/static/queries-with.html
    search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[g.id],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
    ),
    -- Decorate/filter the result so it can be used as input to the "test.create_jsonb_tree" function
    filtered_graph AS (
        SELECT
            sg.path[1] AS id,
            sg.path[2] AS parent_id,
            sg.depth AS level,
            sg.id AS start_id,
            d.name,
            sg.path,
            d.data::jsonb AS json
        FROM search_graph sg
        INNER JOIN graph d ON d.id = sg.path[1]
        ORDER BY level DESC
    )
    -- "Main" query
    SELECT * FROM filtered_graph
;


-- Returns a json object with all children merged into its parents.
-- Parameter 1 "_tree_path_list": A json document with rows from the view "test.tree_path_list_v" aggregates as one big json.
-- Parameter 2 "_children_keyname": Choose the name for the children
CREATE OR REPLACE FUNCTION test.create_jsonb_tree(_tree_path_list jsonb, _children_keyname text DEFAULT 'children', _get_only_children boolean DEFAULT false)
    RETURNS jsonb AS
$$
DECLARE
    node_map jsonb :=  jsonb_build_object();
    node_result jsonb := jsonb_build_array();
    parent_children jsonb := jsonb_build_array();
    node jsonb;
    relation jsonb;
BEGIN
    FOR node IN SELECT * FROM jsonb_array_elements(_tree_path_list)
    LOOP
        RAISE NOTICE 'Input (per row): %', node;
        node_map := jsonb_set(node_map, ARRAY[node->>'id'], node->'json');
    END LOOP;

    FOR relation IN SELECT * FROM jsonb_array_elements(_tree_path_list)
    LOOP
        IF ( (relation->>'level')::int > 1 ) THEN
            parent_children := COALESCE(node_map->(relation->>'parent_id')->_children_keyname, jsonb_build_array()) || jsonb_build_array(node_map->(relation->>'id'));
            node_map := jsonb_set(node_map, ARRAY[relation->>'parent_id', _children_keyname], parent_children);
            node_map := node_map - (relation->>'id');
        ELSE
            IF _get_only_children THEN
                node_result := node_map->(relation->>'id')->_children_keyname;
            ELSE
                node_result := node_map->(relation->>'id');
            END IF;
        END IF;
    END LOOP;
    RETURN node_result;
END;
$$ LANGUAGE plpgsql
;


-- Aggregate the rows from the view into a big json object. The function
SELECT test.create_jsonb_tree(
    (   SELECT jsonb_agg( (SELECT x FROM (SELECT id, parent_id, level, name, json) x) )
        FROM test.tree_path_list_v
        WHERE start_id = 1  --Which node to get children for
    ),
    'children'::text,
    true
)::json
;

输出示例2

[
 {
    "id": 2,
    "name": "c1.1",
    "children": [
      {
        "id": 3,
        "name": "c1.1.1",
        "parent_a_id": 1,
        "parent_b_id": 1,
        "parent_c_id": 2
      }
    ],
    "parent_a_id": 1,
    "parent_b_id": 1,
    "parent_c_id": 1
  },
  {
    "id": 4,
    "name": "c1.2",
    "parent_a_id": 1,
    "parent_b_id": 1,
    "parent_c_id": 1
  }
]

代码示例3:纯sql嵌套json解决方案

我将嵌套-json代码重写为纯sql,并将其放入SQL函数中,以便我们可以通过参数化start_ids(作为数组)重用代码

我还没有对代码进行基准测试,并且它不一定比sql + plpgsql解决方案更好。我必须(ab)使用CTE循环遍历结果,就像我在plgsql中一样将节点添加到父节点。 “合并”的解决方案即使是纯粹的SQL也是必不可少的。

--DROP VIEW test.source_data_v  CASCADE;
--Map your data (in this view) so it can be directly used in the recursive CTE.
CREATE OR REPLACE VIEW test.source_data_v AS
    SELECT 
        id AS id,
        parent_c_id AS parent_id,
        name as name, -- Only for debugging: Give the node a name for easier debugging (a name is easier than an id)
        --jsonb_build_object('id', tree_id, 'name', name, 'pid', parent_tree_id, 'children', jsonb_build_array()) AS data --Allow empty children arrays
        jsonb_build_object('id', id, 'name', name, 'parent_id', parent_c_id) AS data -- Ignore empty children arrays
    FROM test.table_c
;
SELECT * FROM test.source_data_v;


--DROP VIEW test.tree_path_list_v  CASCADE;
CREATE OR REPLACE FUNCTION test.get_nested_object(bigint[]) 
    RETURNS jsonb
AS $$
  WITH RECURSIVE
    search_graph(id, parent_id, data, depth, path, cycle) AS (
        SELECT g.id, g.parent_id, g.data, 1,
          ARRAY[g.id],
          false
        FROM test.source_data_v g
      UNION ALL
        SELECT g.id, g.parent_id, g.data, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM test.source_data_v g, search_graph sg
        WHERE g.id = sg.parent_id AND NOT cycle
    ),
    transformed_result_graph AS (
        SELECT
            sg.path[1] AS id,
            d.parent_id,
            sg.depth AS level,
            sg.id AS start_id,
            d.name,
            sg.path,
            (SELECT string_agg(t.name, ' ') FROM (SELECT unnest(sg.path::int[]) AS id) a INNER JOIN test.source_data_v t USING (id)) AS named_path,
            d.data
        FROM search_graph sg
        INNER JOIN test.source_data_v d ON d.id = sg.path[1]
        WHERE sg.id = ANY($1) --Parameterized input for start nodes
        ORDER BY level DESC, start_id ASC
    ),
    -- Sort path list and build a map/index of all individual nodes which we loop through in the next CTE:
    sorted_paths AS (
        SELECT null::int AS rownum, * 
        FROM transformed_result_graph WHERE false
        UNION ALL
        SELECT
            0, null, null, null, null, null, null, null,
            (SELECT jsonb_object_agg(id::text, data) FROM transformed_result_graph)  -- Build a map/index of all individual nodes
        UNION ALL
        SELECT row_number() OVER () as rownum, *
        FROM transformed_result_graph c
        ORDER BY level DESC, start_id ASC
    ),
    build_tree_loop (rownum, level, id, parent_id, data, named_path, result) AS (
        SELECT
            rownum, level, id, parent_id, data,
            named_path,
            data -- First row has the complete  node map
        FROM sorted_paths
        WHERE rownum = 0
        UNION ALL
        SELECT
            c.rownum, c.level, c.id, c.parent_id, c.data,
            c.named_path,
            CASE WHEN (c.parent_id IS NULL) OR (prev.result->(c.parent_id::text) IS NULL)
                 THEN prev.result
                 WHEN c.parent_id IS NOT NULL
                 THEN jsonb_set(
                        prev.result - (c.id::text),  -- remove node and add it as child
                        ARRAY[c.parent_id::text, 'children'], 
                        COALESCE(prev.result->(c.parent_id::text)->'children',jsonb_build_array())||COALESCE(prev.result->(c.id::text), jsonb_build_object('msg','ERROR')),  -- add node as child (and create empty children array if not exist)
                        true --add key (children) if not exists
                    )
            END AS result
        FROM sorted_paths c  -- Join each row in "sorted_paths" with the previous row from the CTE.
        INNER JOIN build_tree_loop prev ON c.rownum = prev.rownum+1
    ), nested_start_nodes AS (
        SELECT jsonb_agg(q.value) AS result
        FROM jsonb_each((SELECT result FROM build_tree_loop ORDER BY rownum DESC LIMIT 1)) q
    )
    -- "Main" query
    SELECT result FROM nested_start_nodes
$$ LANGUAGE sql STABLE;
-- END of sql function 

SELECT test.get_nested_object(ARRAY[1]);

<强>输出: 不幸的是,jsonb没有保留命令,所以“children”键首先出现,使得读取树更难。

[
{
    "children": [
        {
            "children": [
                {
                    "id": 3,
                    "name": "c1.1.1",
                    "parent_id": 2
                }
            ],
            "id": 2,
            "name": "c1.1",
            "parent_id": 1
        },
        {
            "id": 4,
            "name": "c1.2",
            "parent_id": 1
        }
    ],
    "id": 1,
    "name": "c1",
    "parent_id": null
}
]

代码示例4

另一种变体:我将所有内容都放入plsql函数中。函数内部的动态查询将任何视图/表的名称作为参数,其中包含列id + parent_id + data + name。它还需要一系列ID来启动。在查询中使用该函数时,您可以将一组id聚合为数组作为输入。 (array_agg等)。

该功能不是“透明的”,因此更难以优化索引等。将“_debug”参数设置为true后,函数将输出原始生成的sql作为通知,因此您可以解释分析查询。

/*
Parameters:
    _ids                Array of ids. Specify where to start recursion down the tree.
    _view               Name of a view/table with the source data. The view must contain the following colums:
                            id(int/bigint)
                            parent_id(int/bigint)
                            data(jsonb)  The data for each node, without the children key, which is added in this func.
                            name(text)   Name is optional, only used for debugging purposes, can be empty string.
    _children_keyname   What key to use for children arrays
    _no_root            Exclude the root node, only returning the children array. Makes less sense when returning multiple root nodes (dont know which children belongs to which roots)
*/          
--DROP FUNCTION test.get_nested_jsonb(bigint[], regclass, text, boolean, boolean) CASCADE;
CREATE OR REPLACE FUNCTION test.get_nested_jsonb(_ids bigint[], _view regclass, _children_keyname text DEFAULT 'children', _no_root boolean DEFAULT false, _debug boolean DEFAULT false)
    RETURNS jsonb AS $$
DECLARE
    dynamic_sql text := '';
    tree_path_list jsonb;
    node_map jsonb :=  jsonb_build_object();
    node_result jsonb := jsonb_build_array();
    parent_children jsonb := jsonb_build_array();
    node jsonb;
    relation jsonb;    
BEGIN
    dynamic_sql := format(
    '    
        WITH RECURSIVE
        search_graph(id, parent_id, depth, path, cycle) AS (
            SELECT g.id, g.parent_id, 1,
              ARRAY[g.id],
              false
            FROM '|| _view ||' g
          UNION ALL
            SELECT g.id, g.parent_id, sg.depth + 1,
              path || g.id,
              g.id = ANY(path)
            FROM '|| _view ||' g, search_graph sg
            WHERE g.id = sg.parent_id AND NOT cycle
        ),
        graph_by_id AS (
            SELECT
                sg.path[1] AS id, d.parent_id, sg.depth, sg.id AS start_id, d.name, sg.path,
                --(SELECT string_agg(t.name, '' '') FROM (SELECT unnest(sg.path::int[]) AS id) a INNER JOIN '|| _view ||' t USING (id)) AS named_path, -- For debugging, show the path as list of names instead of ids
                d.data
            FROM search_graph sg
            INNER JOIN '|| _view ||' d ON d.id = sg.path[1] -- Join in data for the current node
            WHERE sg.id = ANY($1) --Parameterized input for start nodes: To debug raw sql: replace variable $1 with array of ids: ARRAY[1]
            ORDER BY depth DESC, start_id ASC
        )
        SELECT jsonb_agg( (SELECT x FROM (SELECT id, parent_id, depth, name, data) x) )
        FROM graph_by_id
    ');
    IF _debug THEN
        RAISE NOTICE 'Dump of raw dynamic SQL. Remember to replace $1 with ARRAY[id1,id2]: %', dynamic_sql;
    END IF;
    EXECUTE dynamic_sql USING _ids INTO tree_path_list;

    -- Create a node map (id as key)
    FOR node IN SELECT * FROM jsonb_array_elements(tree_path_list)
    LOOP
        node := jsonb_set(node, ARRAY['data', _children_keyname], jsonb_build_array()); --add children key to all nodes
        node_map := jsonb_set(node_map, ARRAY[node->>'id'], node->'data');
    END LOOP;
    RAISE NOTICE 'dump: %', node_map;

    -- Loop sorted list, add nodes to node map from leaves and up
    FOR relation IN SELECT * FROM jsonb_array_elements(tree_path_list)
    LOOP
        IF ( (relation->>'depth')::int > 1 ) THEN
            parent_children := COALESCE(node_map->(relation->>'parent_id')->_children_keyname, jsonb_build_array()) || jsonb_build_array(node_map->(relation->>'id'));
            node_map := jsonb_set(node_map, ARRAY[relation->>'parent_id', _children_keyname], parent_children);
            node_map := node_map - (relation->>'id');
        ELSE
            IF _no_root THEN
                node_result := node_map->(relation->>'id')->_children_keyname;
            ELSE
                node_result := node_map->(relation->>'id');
            END IF;
        END IF;
    END LOOP;
    RETURN node_result;    
END;
$$ LANGUAGE plpgsql STABLE;

-- Test the function on a view 'test.source_data_v', starting from id=1
SELECT test.get_nested_jsonb(ARRAY[1], 'test.source_data_v', 'children', false, true);