如何查询和索引嵌套在PostgreSQL 9.3+深层多级的JSON数据?

时间:2014-01-13 21:56:17

标签: sql json postgresql

在PostgreSQL 9.3中,我存储了一些相当复杂的JSON对象,其中数组嵌套在数组中。此片段不是真实数据,但说明了相同的概念:

{
   "customerId" : "12345",
   "orders" : [{
      "orderId" : "54321",
      "lineItems" : [{
         "productId" : "abc",
         "qty" : 3
      }, {
         "productId" : "def",
         "qty" : 1
      }]
   }
}

我希望SQL查询能够对lineItem个对象进行操作...不仅在这个单一的JSON结构中,而且在该表列中的所有JSON对象中。例如,一个SQL查询返回所有不同的productId,以及它们的总销售额qty总和。为了防止这样的查询花了一整天,我可能想要lineItem或其子字段的索引。

使用this StackOverflow question,我想出了如何编写一个有效的查询:

SELECT
   line_item->>'productId' AS product_id, 
   SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold
FROM
   my_table, 
   json_array_elements(my_table.my_json_column->'orders') AS order,
   json_array_elements(order->'lineItems') AS line_item
GROUP BY product_id;

然而,原始StackOverflow问题处理的数据只是嵌套一个级别深而不是两个。我扩展了相同的概念(即FROM条款中的“横向连接”),通过添加额外的横向连接来深入一级。但是,我不确定这是否是最好的方法,因此我的问题的第一部分是:查询JSON对象中任意数量的级别的JSON数据的最佳方法是什么? ?

对于第二部分,在这样的嵌套数据上创建索引,this StackOverflow question再次处理仅嵌套一层深度的数据。然而,我只是完全迷失了,我的头脑游泳试图想想如何将这个应用到更深层次的水平。任何人都可以提供一个明确的方法来索引至少两个级别的数据,就像上面的lineItems一样吗?

1 个答案:

答案 0 :(得分:2)

要处理无限递归问题,您需要使用recursive CTE对每个表行中的每个json元素进行操作:

WITH RECURSIVE

raw_json as (

  SELECT 

    *

  FROM

  (VALUES 

  (1,
  '{
  "customerId": "12345",
  "orders": [
    {
      "orderId": "54321",
      "lineItems": [
        {
          "productId": "abc",
          "qty": 3
        },
        {
          "productId": "def",
          "qty": 1
        }
      ]
    }
  ]
}'::json),

  (2,
  '{
  "customerId": "678910",
  "artibitraryLevel": {
    "orders": [
      {
        "orderId": "55345",
        "lineItems": [
          {
            "productId": "abc",
            "qty": 3
          },
          {
            "productId": "ghi",
            "qty": 10
          }
        ]
      }
    ]
  }
}'::json)



) a(id,sample_json)

),


json_recursive as (

  SELECT
    a.id,
    b.k,
    b.v,
    b.json_type,
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id, --track any arbitrary id when iterating through json graph
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id

  FROM

    (

      SELECT

        id,
        sample_json v,
        case left(sample_json::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type, --because choice of json accessor function depends on this, and for some reason postgres has no built in function to get this value
        sample_json->>'customerId' customer_id,
        sample_json->>'orderId' order_id,
        sample_json->>'productId' product_id

      FROM

        raw_json
    ) a
    CROSS JOIN LATERAL (

      SELECT

        b.k,
        b.v,
        case left(b.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_each(case json_type when 'object' then a.v else null end ) b(k,v) --get key value pairs for individual elements if we are dealing with standard object

     UNION ALL


      SELECT

        null::text k,
        c.v,
        case left(c.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_array_elements(case json_type when 'array' then a.v else null end) c(v) --if we have an array, just get the elements and use parent key


    ) b


UNION ALL --recursive term

    SELECT
    a.id,
    b.k,
    b.v,
    b.json_type,
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id,
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id,
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id




  FROM

    json_recursive a
    CROSS JOIN LATERAL (

      SELECT

        b.k,
        b.v,
        case left(b.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_each(case json_type when 'object' then a.v else null end ) b(k,v)


     UNION ALL


      SELECT

        a.k,
        c.v,
        case left(c.v::text,1)
          when '[' then 'array'
          when '{' then 'object'
          else 'scalar'
        end json_type


      FROM

        json_array_elements(case json_type when 'array' then a.v else null end) c(v)

    ) b

)

然后你可以用任意的id加上“数量”......

SELECT
  customer_id,
  sum(v::text::integer)

FROM

  json_recursive

WHERE

  k = 'qty'

GROUP BY 

  customer_id

或者您可以获取“lineItem”对象并根据需要操作它们:

SELECT 

  *

FROM 

  json_recursive 

WHERE

  k = 'lineItems' and json_type = 'object'

对于索引,您可以将递归查询调整为一个函数,该函数返回原始表的每一行中每个json对象的唯一键,然后在json列上创建一个函数索引:

SELECT

  array_agg(DISTINCT k)

FROM

  json_recursive

WHERE

  not k is null