Question

我刚刚发现了PostgreSQL的JSONB，并想知道如果我将它用于我所有的表列，会出现什么问题？

也就是说我的所有表都有主键和外键作为列，而JSONB类型的field列用于任何其他数据。

除了由于JSONB的开销而占用额外的空间，以及在“列”上输入错误之外，我还会错过什么？

Answer 1

事实证明你在这里做了些什么。

使用关系数据库的要点。

定义良好的关系。
定义明确且详细的架构。
大数据集的高性能。

你可以保持关系。但是你失去了架构和很多性能。模式不仅仅是数据验证。这意味着您无法对各个字段使用触发器或约束。

至于性能......您会注意到大多数JSONB性能测试都针对其他类似的数据类型。他们从不反对普通的SQL表。这是因为，虽然JSONB效率惊人，但它的效率不如常规SQL。所以，让我们对它进行测试，结果发现你已经在这里做了些什么。

使用this JSONB performance presentation中的数据集，我创建了一个合适的SQL架构...

create table customers (
    id text primary key
);

create table products (
    id text primary key,
    title text,
    sales_rank integer,
    "group" text,
    category text,
    subcategory text,
    similar_ids text[]
);

create table reviews (
    customer_id text references customers(id),
    product_id text references products(id),
    "date" timestamp,
    rating integer,
    votes integer,
    helpful_votes integer
);

一个使用SQL关系，但JSONB用于数据...

create table customers (
    id text primary key
);

create table products_jb (
    id text primary key,
    fields jsonb
);

create table reviews_jb (
    customer_id text references customers(id),
    product_id text references products_jb(id),
    fields jsonb
);

还有一个JSONB表。

create table reviews_jsonb (
    review jsonb
);

然后我imported the same data into both sets of tables using a little script。 589859评论，93319产品，98761客户。

让我们尝试与JSONB性能文章中相同的查询，获取产品类别的平均评价。首先，没有索引。

传统SQL：138毫秒

test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 138.631 ms

完整的JSONB：380毫秒

test=> select round(avg((review#>>'{review,rating}')::numeric),2)
test-> from reviews_jsonb
test-> where review #>>'{product,category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 380.697 ms

混合JSONB：190毫秒

test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 192.333 ms

老实说，这比他想象的要好。混合方法的速度是完整JSONB的两倍，但比普通SQL慢50％。现在如何使用索引？

传统SQL：130毫秒（索引为+500毫秒）

test=> create index products_category on products(category);
CREATE INDEX
Time: 491.969 ms

test=> select round(avg(r.rating), 2)
from reviews r
join products p on p.id = r.product_id
where p.category = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 128.212 ms

完整JSONB：360毫秒（索引+ 25000毫秒）

test=> create index on reviews_jsonb using gin(review);
CREATE INDEX
Time: 25253.348 ms
test=> select round(avg((review#>>'{review,rating}')::numeric),2)
from reviews_jsonb
where review #>>'{product,category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 363.222 ms

混合JSONB：185毫秒（索引为+6900毫秒）

test=> create index on products_jb using gin(fields);
CREATE INDEX
Time: 3654.894 ms
test=> create index on reviews_jb using gin(fields);
CREATE INDEX
Time: 3237.534 ms
test=> select round(avg((r.fields#>>'{rating}')::numeric),2)
from reviews_jb r
join products_jb p on p.id = r.product_id
where p.fields#>>'{category}' = 'Home & Garden';
 round 
-------
  4.59
(1 row)

Time: 183.679 ms

事实证明这是一个查询索引并不会给你带来很多帮助。

这就是我所看到的数据，混合JSONB总是慢于Full SQL，但比Full JSONB更快。这似乎是一个很好的妥协。您可以使用传统的外键和连接，但可以灵活地添加您喜欢的任何字段。

我建议将混合方法更进一步：对于您知道将要存在的字段使用SQL列，并使用JSONB列来获取任何其他字段以获得灵活性。

我鼓励您在这里使用测试数据，看看性能如何。

将JSONB用于除主键和外键之外的postgres列

1 个答案: