我的功能运行速度太慢。我已经分离出哪一部分函数很慢..一个小SELECT
语句:
SELECT image_group_id
FROM programs.image_family fam
JOIN programs.provider_file pf
ON (fam.provider_data_id = pf.provider_data_id
AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL)
LIMIT 1
当我运行该函数时,这段SQL会生成以下查询计划:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=0.56..6.75 rows=1 width=6) (actual time=3471.004..3471.004 rows=0 loops=1)
-> Nested Loop (cost=0.56..594054.42 rows=96017 width=6) (actual time=3471.002..3471.002 rows=0 loops=1)
-> Seq Scan on image_family fam (cost=0.00..391880.08 rows=96023 width=6) (actual time=3471.001..3471.001 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
当我在查询工具(函数外部)中运行所选查询时,查询计划如下所示:
Limit (cost=1.12..3.81 rows=1 width=6) (actual time=0.043..0.043 rows=1 loops=1)
Output: pf.image_group_id
Buffers: shared hit=11
-> Nested Loop (cost=1.12..14.55 rows=5 width=6) (actual time=0.041..0.041 rows=1 loops=1)
Output: pf.image_group_id
Inner Unique: true
Buffers: shared hit=11
-> Index Only Scan using image_family_family_id_provider_data_id_idx on programs.image_family fam (cost=0.56..1.65 rows=5 width=6) (actual time=0.024..0.024 rows=1 loops=1)
Output: fam.family_id, fam.provider_data_id
Index Cond: (fam.family_id = 8419853)
Heap Fetches: 2
Buffers: shared hit=6
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on programs.provider_file pf (cost=0.56..2.58 rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
Output: pf.provider_data_id, pf.provider_file_path, pf.posted_dt, pf.file_repository_id, pf.restricted_size, pf.image_group_id, pf.is_master, pf.is_biggest
Index Cond: (pf.provider_data_id = fam.provider_data_id)
Filter: (pf.image_group_id IS NOT NULL)
Buffers: shared hit=5
Planning time: 0.809 ms
Execution time: 0.100 ms
如果我在函数中禁用序列扫描,我可以得到类似的查询计划:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=1.12..8.00 rows=1 width=6) (actual time=3855.722..3855.722 rows=0 loops=1)
-> Nested Loop (cost=1.12..660217.34 rows=96017 width=6) (actual time=3855.721..3855.721 rows=0 loops=1)
-> Index Only Scan using image_family_family_id_provider_data_id_idx on image_family fam (cost=0.56..458043.00 rows=96023 width=6) (actual time=3855.720..3855.720 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
Heap Fetches: 368
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
查询计划与Filter函数仅用于Index Only Scan的情况不同。该函数有更多Heap Fetches
,似乎将参数视为一个转换为numeric
的字符串。
我尝试过的事情:
SQL
EXECUTE ... INTO .. USING
。两张桌子的构成:
image_family:
provider_data_id: numeric(16)
family_id: int4
(为简洁省略其余部分)
provider_data_id
family_id
我最近在(family_id, provider_data_id)
添加了一个唯一索引
这里大约有2000万行。家庭有很多provider_data_ids,但并非所有provider_data_ids都是家庭的一部分,因此不在此表中。
provider_file:
provider_data_id numeric(16)
image_group_id numeric(16)
(为简洁省略其余部分)
provider_data_id
此表中约有3200万行。大多数行(> 95%)具有非空image_group_id
。
Postgres版本10
如何从查询工具中的函数或原始SQL调用查询性能以获得匹配?
答案 0 :(得分:1)
问题在于这一行:
Filter: ((family_id)::numeric = '8419853'::numeric)
无法使用family_id
上的索引,因为family_id
与numeric
值进行比较。这需要转换为numeric
,family_id::numeric
上没有索引。
尽管integer
和numeric
都是表示数字的类型,但它们的内部表示却完全不同,因此索引不兼容。换句话说,对numeric
的强制转换就像PostgreSQL的函数一样,并且因为它没有该函数表达式的索引,所以它必须求助于扫描整个表(或索引)。
解决方案很简单:使用integer
而不是numeric
参数进行查询。如果有疑问,请使用像
fam.family_id = $1::integer