Question

我在PostgreSQL（9.6）数据库中有一个 TEXT 列，其中包含一个或多个词典的列表，例如那些词典。

[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]

或

[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]

我想从该列中检索每个字典并将其解析为不同的列。

对于此示例：

id | customer | blurb
---+----------+------
 1 | Joe      | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
 2 | Sally    | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]

将成为：

id | customer | line_total_excl_vat  | account |  product_id | ...
---+----------+----------------------+---------+------------
 1 | Joe      | 583.3300             |     null|      5532548  
 2 | Sally    | 500.0000             |     null|         null
 3 | Sally    | 250.0000             |     null|      5532632

Answer 1

如果您事先知道要提取的字段，请将文本转换为json / jsonb并使用json_to_recordset / jsonb_to_recordset。请注意，此方法要求显式指定字段名称/类型。 json字典中未指定的字段将不会被提取。

参见官方postgesql documentation on json-functions

自包含的示例：

WITH tbl (id, customer, dat) as ( values
     (1, 'Joe',
      '[{ "line_total_excl_vat": "583.3300"
        , "account": ""
        , "subtitle": ""
        , "product_id": 5532548
        , "price_per_unit": "583.3333"
        , "line_total_incl_vat": "700.0000"
        , "text": "PROD0008"
        , "amount": "1.0000"
        , "vat_rate": "20"}]')
    ,(2, 'Sally', 
      '[{ "line_total_excl_vat": "500.0000"
        , "account": ""
        , "subtitle": ""
        , "product_id": ""
        , "price_per_unit": "250.0000"
        , "line_total_incl_vat": "600.0000"
        , "text": "PROD003"
        , "amount": "2.0000"
        , "vat_rate": "20"}
      , { "line_total_excl_vat": "250.0000"
        , "account": ""
        , "subtitle": ""
        , "product_id": 5532632
        , "price_per_unit": "250.0000"
        , "line_total_incl_vat": "300.0000"
        , "text": "PROD005"
        , "amount": "1.0000"
        , "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
   , json_to_recordset(dat::json) x 
        (  line_total_excl_vat numeric
         , acount text
         , subtitle text
         , product_id text
         , price_per_unit numeric
         , line_total_incl_vat numeric
         , "text" text
         , amount numeric
         , vat_rate numeric
        )

产生以下输出：

id    customer    line_total_excl_vat    acount    subtitle    product_id    price_per_unit    line_total_incl_vat    text      amount    vat_rate
 1    Joe                      583.33                             5532548    583.3333                          700    PROD0008       1          20
 2    Sally                       500                                             250                          600    PROD003        2          20
 2    Sally                       250                             5532632         250                          300    PROD005        1          20

这种格式通常称为 wide 格式。

还可以提取 long 格式的数据，这还有一个好处，就是可以保留所有数据而无需明确提及字段名。在这种情况下，查询可以写为（为了简洁起见，省略了测试数据）

SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
   , lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
   , lateral json_each_text(x.val) y

以上语句中的with ordinality为未嵌套数组中的每个元素添加了一个序列号，并用于为每个客户消除来自不同数组的字段的歧义。

这产生了输出：

id  customer key                    value     record_number
1   Joe      line_total_excl_vat    583.3300  1
1   Joe      account                          1
1   Joe      subtitle                         1
1   Joe      product_id             5532548   1
1   Joe      price_per_unit         583.3333  1
1   Joe      line_total_incl_vat    700.0000  1
1   Joe      text                   PROD0008  1
1   Joe      amount                 1.0000    1
1   Joe      vat_rate               20        1
2   Sally    line_total_excl_vat    500.0000  1
2   Sally    account                          1
2   Sally    subtitle                         1
2   Sally    product_id                       1
2   Sally    price_per_unit         250.0000  1
2   Sally    line_total_incl_vat    600.0000  1
2   Sally    text                   PROD003   1
2   Sally    amount                 2.0000    1
2   Sally    vat_rate               20        1
2   Sally    line_total_excl_vat    250.0000  2
2   Sally    account                          2
2   Sally    subtitle                         2
2   Sally    product_id             5532632   2
2   Sally    price_per_unit         250.0000  2
2   Sally    line_total_incl_vat    300.0000  2
2   Sally    text                   PROD005   2
2   Sally    amount                 1.0000    2
2   Sally    vat_rate               20        2

Answer 2

整理json字段会有所帮助。这就是在将数据插入表之前可以完成的工作。

但是，按照您的示例，以下代码应该可以工作：

create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';

with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data) 
     as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);

首先，我们创建jsonb_table，通过以下方式将您的字典字段转换为postgres jsonb字段：

1）通过分割'}，{'字符序列将字符串转换为数组

2）将数组元素取消嵌套到行

3）清理'[] {}'字符并将字符串转换为jsonb

然后我们使用jsonb_to_record函数将jsonb记录转换为列。在这里，我们必须指定列定义所需的尽可能多的字段。

在PostgreSQL中取消嵌套JSON对象的列表

2 个答案: