在桌子上(由django模型使用)我使用jsonb列data
来存储从webservice获取的任意数据:
abs=# \d data_importer_rawdata;
Table "public.data_importer_rawdata"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+---------------------------------------------------
id | integer | | not null | nextval('data_importer_rawdata_id_seq'::regclass)
created | timestamp with time zone | | not null |
modified | timestamp with time zone | | not null |
entity_id | character varying(50)[] | | not null |
entity_id_key | character varying(50)[] | | not null |
service | character varying(100) | | not null |
data | jsonb | | not null |
data_hash | bigint | | not null |
content_type_id | integer | | not null |
last_update | timestamp with time zone | | |
Indexes:
"data_importer_rawdata_pkey" PRIMARY KEY, btree (id)
"data_importer_rawdata_entity_id_service_conten_5fcc60bd_uniq" UNIQUE CONSTRAINT, btree (entity_id, service, content_type_id)
"data_importer_rawdata_content_type_id_63138c35" btree (content_type_id)
"rawdata_data_idx" gin (data jsonb_path_ops)
"rawdata_entity_id_idx" btree (entity_id)
"rawdata_entity_id_key_idx" btree (entity_id_key)
"rawdata_service_idx" btree (service)
Foreign-key constraints:
"data_importer_rawdat_content_type_id_63138c35_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED
记录是> 1M。
然而,尽管有各种索引策略(遵循this blog post),但性能仍然很差:
abs=# EXPLAIN ANALYZE SELECT
"data_importer_rawdata"."id",
"data_importer_rawdata"."created",
"data_importer_rawdata"."modified",
"data_importer_rawdata"."entity_id",
"data_importer_rawdata"."entity_id_key",
"data_importer_rawdata"."service",
"data_importer_rawdata"."content_type_id",
"data_importer_rawdata"."data",
"data_importer_rawdata"."data_hash",
"data_importer_rawdata"."last_update"
FROM "data_importer_rawdata"
WHERE ("data_importer_rawdata"."data" -> 'object_id')
= '"b8a096da-ff83-47dc-8d22-289ddb46b1c1"';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on data_importer_rawdata (cost=0.00..142508.65 rows=5155 width=856) (actual time=933.902..8240.465 rows=2 loops=1)
Filter: ((data -> 'object_id'::text) = '"b8a096da-ff83-47dc-8d22-289ddb46b1c1"'::jsonb)
Rows Removed by Filter: 1030908
Planning time: 0.158 ms
Execution time: 8240.493 ms
我尝试删除"rawdata_data_idx"
并在单个jsonb密钥BTree
上使用object_id
索引,但表现几乎相同:
abs=# drop index "rawdata_data_idx";
abs=# CREATE INDEX "rawdata_data_object_ididx"
ON "data_importer_rawdata" USING BTREE ((data->>'object_id'));
abs=# EXPLAIN ANALYZE SELECT
"data_importer_rawdata"."id",
"data_importer_rawdata"."created",
"data_importer_rawdata"."modified",
"data_importer_rawdata"."entity_id",
"data_importer_rawdata"."entity_id_key",
"data_importer_rawdata"."service",
"data_importer_rawdata"."content_type_id",
"data_importer_rawdata"."data",
"data_importer_rawdata"."data_hash",
"data_importer_rawdata"."last_update"
FROM "data_importer_rawdata"
WHERE ("data_importer_rawdata"."data" -> 'object_id')
= '"b8a096da-ff83-47dc-8d22-289ddb46b1c1"';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on data_importer_rawdata (cost=0.00..142508.65 rows=5155 width=856) (actual time=951.522..8318.851 rows=2 loops=1)
Filter: ((data -> 'object_id'::text) = '"b8a096da-ff83-47dc-8d22-289ddb46b1c1"'::jsonb)
Rows Removed by Filter: 1030908
Planning time: 0.311 ms
Execution time: 8318.878 ms
有关于此的任何建议吗?不确定这是此类任务的平均表现。
答案 0 :(得分:2)
您的查询执行速度很慢,因为无法使用索引。
要使用索引,条件中的表达式必须与索引定义中的表达式相同,即
WHERE "data_importer_rawdata"."data" ->> 'object_id'
= 'b8a096da-ff83-47dc-8d22-289ddb46b1c1'