我在PostgreSQL(9.6)数据库中有一个 TEXT 列,其中包含一个或多个词典的列表,例如那些词典。
[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
或
[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
我想从该列中检索每个字典并将其解析为不同的列。
对于此示例:
id | customer | blurb
---+----------+------
1 | Joe | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
2 | Sally | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
将成为:
id | customer | line_total_excl_vat | account | product_id | ...
---+----------+----------------------+---------+------------
1 | Joe | 583.3300 | null| 5532548
2 | Sally | 500.0000 | null| null
3 | Sally | 250.0000 | null| 5532632
答案 0 :(得分:4)
如果您事先知道要提取的字段,请将文本转换为json / jsonb并使用json_to_recordset
/ jsonb_to_recordset
。请注意,此方法要求显式指定字段名称/类型。 json字典中未指定的字段将不会被提取。
参见官方postgesql documentation on json-functions
自包含的示例:
WITH tbl (id, customer, dat) as ( values
(1, 'Joe',
'[{ "line_total_excl_vat": "583.3300"
, "account": ""
, "subtitle": ""
, "product_id": 5532548
, "price_per_unit": "583.3333"
, "line_total_incl_vat": "700.0000"
, "text": "PROD0008"
, "amount": "1.0000"
, "vat_rate": "20"}]')
,(2, 'Sally',
'[{ "line_total_excl_vat": "500.0000"
, "account": ""
, "subtitle": ""
, "product_id": ""
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "600.0000"
, "text": "PROD003"
, "amount": "2.0000"
, "vat_rate": "20"}
, { "line_total_excl_vat": "250.0000"
, "account": ""
, "subtitle": ""
, "product_id": 5532632
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "300.0000"
, "text": "PROD005"
, "amount": "1.0000"
, "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
, json_to_recordset(dat::json) x
( line_total_excl_vat numeric
, acount text
, subtitle text
, product_id text
, price_per_unit numeric
, line_total_incl_vat numeric
, "text" text
, amount numeric
, vat_rate numeric
)
产生以下输出:
id customer line_total_excl_vat acount subtitle product_id price_per_unit line_total_incl_vat text amount vat_rate
1 Joe 583.33 5532548 583.3333 700 PROD0008 1 20
2 Sally 500 250 600 PROD003 2 20
2 Sally 250 5532632 250 300 PROD005 1 20
这种格式通常称为 wide 格式。
还可以提取 long 格式的数据,这还有一个好处,就是可以保留所有数据而无需明确提及字段名。在这种情况下,查询可以写为(为了简洁起见,省略了测试数据)
SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
, lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
, lateral json_each_text(x.val) y
以上语句中的with ordinality
为未嵌套数组中的每个元素添加了一个序列号,并用于为每个客户消除来自不同数组的字段的歧义。
这产生了输出:
id customer key value record_number
1 Joe line_total_excl_vat 583.3300 1
1 Joe account 1
1 Joe subtitle 1
1 Joe product_id 5532548 1
1 Joe price_per_unit 583.3333 1
1 Joe line_total_incl_vat 700.0000 1
1 Joe text PROD0008 1
1 Joe amount 1.0000 1
1 Joe vat_rate 20 1
2 Sally line_total_excl_vat 500.0000 1
2 Sally account 1
2 Sally subtitle 1
2 Sally product_id 1
2 Sally price_per_unit 250.0000 1
2 Sally line_total_incl_vat 600.0000 1
2 Sally text PROD003 1
2 Sally amount 2.0000 1
2 Sally vat_rate 20 1
2 Sally line_total_excl_vat 250.0000 2
2 Sally account 2
2 Sally subtitle 2
2 Sally product_id 5532632 2
2 Sally price_per_unit 250.0000 2
2 Sally line_total_incl_vat 300.0000 2
2 Sally text PROD005 2
2 Sally amount 1.0000 2
2 Sally vat_rate 20 2
答案 1 :(得分:2)
整理json字段会有所帮助。这就是在将数据插入表之前可以完成的工作。
但是,按照您的示例,以下代码应该可以工作:
create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';
with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data)
as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);
首先,我们创建jsonb_table,通过以下方式将您的字典字段转换为postgres jsonb字段:
1)通过分割'},{'字符序列将字符串转换为数组
2)将数组元素取消嵌套到行
3)清理'[] {}'字符并将字符串转换为jsonb
然后我们使用jsonb_to_record函数将jsonb记录转换为列。在这里,我们必须指定列定义所需的尽可能多的字段。