Question

例如，我有一个文件customers.json，它是一个对象数组（严格形成），并且非常简单（没有嵌套对象），这样（重要的是：它已经包含了ids）：

[
  {
    "id": 23635,
    "name": "Jerry Green",
    "comment": "Imported from facebook."
  },
  {
    "id": 23636,
    "name": "John Wayne",
    "comment": "Imported from facebook."
  }
]

我想将它们全部导入到我的postgres数据库中customers。

当我将它作为json类型的列导入到像imported_json这样的表和名为data的列并且其中列出了对象时，我找到了一些相当困难的方法，然后使用sql来获取这些值和将其插入真实的表格中。

但有没有一种简单的方法可以将json导入到postgres而不会触及sql？

Answer 1

事实证明，使用命令行psql工具将多行JSON对象导入postgres数据库中的JSON列是一种简单的方法，而无需将JSON显式嵌入到SQL语句中。该技术记录在postgresql docs中，但它有点隐藏。

诀窍是使用反引号将JSON加载到psql变量中。例如，给定 /tmp/test.json 中的多行JSON文件，例如：

{
  "dog": "cat",
  "frog": "frat"
}

我们可以使用以下SQL将其加载到临时表中：

sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;

给出了结果：

               j                
────────────────────────────────
 {"dog": "cat", "frog": "frat"}
(1 row)

您还可以直接对数据执行操作：

sql> select :'content'::jsonb -> 'dog';
 ?column? 
──────────
 "cat"
(1 row)

在中只是将JSON嵌入到SQL中，但是让psql执行插值本身就更加整洁了。

Answer 2

您可以将JSON提供给SQL语句，该语句提取信息并将其插入表中。如果JSON属性具有与表列完全相同的名称，则可以执行以下操作：

with customer_json (doc) as (
   values 
    ('[
      {
        "id": 23635,
        "name": "Jerry Green",
        "comment": "Imported from facebook."
      },
      {
        "id": 23636,
        "name": "John Wayne",
        "comment": "Imported from facebook."
      }
    ]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;

将插入新客户，现有客户将更新。＆＃34;魔术＆＃34; part是json_populate_recordset(null::customer, doc)，它生成JSON对象的关系表示。

上面假设了一个像这样的表定义：

create table customer 
(
  id        integer primary key,
  name      text not null,
  comment   text
);

如果数据是作为文件提供的，则需要先将该文件放入数据库中的某个表中。像这样：

create unlogged table customer_import (doc json);

然后将文件上传到该表的一行，例如使用\copy中的psql命令（或SQL客户端提供的任何内容）：

\copy customer_import from 'customers.json' ....

然后你可以使用上面的语句，只需删除CTE并使用登台表：

insert into customer (id, name, comment)
select p.*
from customer_import l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;

Answer 3

从文件导入json的最简单方法似乎是不从文件导入单个json 而是单列csv：单行json的列表：

data.json：

{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}

然后，在psql下：

create table t ( j jsonb )
\copy t from 'd:\path\data.json'

每个json（行）一个记录将添加到t表中。

“ \ copy from”导入是为csv制作的，因此逐行加载数据。结果，每行读取一个json 而不是单个json数组，以后再拆分，将不使用任何中间表。

更多的是，您不会遇到最大输入行大小限制，如果您输入的json文件太大，则会出现此限制。

因此，我将首先将您的输入转换为单列csv，然后使用复制命令将其导入。

Answer 4

如果要从命令行执行此操作...

注意：这不是您问题的直接答案，因为这将需要您将JSON转换为SQL。无论如何，转换时您可能必须处理JSON'null'。不过，您可以使用视图或实例化视图使该问题不可见。

这是我用于将JSON导入PostgreSQL（WSL Ubuntu）的脚本，它基本上要求您在同一命令行中混合使用psql meta命令和SQL。请注意使用有点晦涩的脚本命令，该命令分配了一个伪tty：

$ more update.sh
#!/bin/bash
wget <filename>.json
echo '\set content `cat $(ls -t <redacted>.json.* | head -1)` \\ delete from <table>; insert into <table> values(:'"'content'); refresh materialized view <view>; " | PGPASSWORD=<passwd> psql -h <host> -U <user> -d <database>
$

（摘自我在Shell script to execute pgsql commands in files的回答）

如何将JSON文件导入PostgreSQL？

4 个答案: