我有一个查询让我感到困扰,这已经找到了解决方法,但是我也想了解为什么首先要从中学习到一些东西。
背景: 我有一个带有“对象”的数据库,这些对象具有与之相连的动态“属性”,用户可以在其中搜索和过滤值。我们知道,查询的最佳方法是运行一条语句,但是由于可以应用于每个客户端的静态和动态属性的各种过滤器的复杂性,我们选择在以下位置创建一个临时搜索表每个搜索查询的开头。我们从“对象”表中进行粗略选择,然后从中创建临时表。任何进一步的过滤器都会做出选择,并从搜索表中删除。当没有更多过滤器可应用时,最终的“搜索结果”会显示给用户。
这种特殊情况将在初始搜索表中添加〜120.000行,这是非常基本的,只带有一些ID。我们没有在搜索表上创建任何索引,因为在任何以前的情况下都没有必要这样做。该客户端的属性表由〜17.000.000行组成,重要列为 object_id,language_id,propertytype_id和stringvalue ,我们在其上具有组合的BTREE索引。
此查询返回的速度非常快,可以从属性表中选择包含我们要过滤的值的所有ID:
SELECT o."id"
FROM "object_table" o
INNER JOIN "propertyvalue_table" pt
ON o."id" = pt."object_id"
AND pt."propertytype_id" = 41
AND pt."language_id" = 1
GROUP BY o."id"
HAVING count(1) = <numberOfFilterValues>
AND array_agg(LOWER(pt."stringvalue")) @> string_to_array(lower('<filtervalue(s)>'), ',')
但是当我将该查询注入到delete语句中时,它突然变得非常糟糕:
DELETE FROM "temporary_search_table"
WHERE NOT EXISTS (
SELECT 1
FROM "object_table" o
INNER JOIN "propertyvalue_table" pt
ON o."id" = pt."object_id"
AND pt."propertytype_id" = 41
AND pt."language_id" = 1
GROUP BY o."id"
HAVING count(1) = <numberOfFilterValues>
AND array_agg(LOWER(pt."stringvalue")) @> string_to_array(lower('<filtervalue(s)>'), ',')
WHERE temporary_search_table."id" = o."id"
)
删除查询的EXPLAIN ANALYZE需要15分钟才能完成,我对解释的有限理解是告诉我子查询在temp表中的每一行都循环了一次,但是我不明白为什么! (该分析包含了一些额外的联接,我在示例中已省略这些联接以减少混乱)
Aggregate (cost=29796768.84..29796768.85 rows=1 width=8) (actual time=1234845.455..1234845.456 rows=1 loops=1)
-> Seq Scan on testsearch (cost=0.00..29796594.59 rows=69700 width=0) (actual time=74.967..1234814.642 rows=98515 loops=1)
Filter: (NOT (SubPlan 2))
Rows Removed by Filter: 30098
SubPlan 2
-> GroupAggregate (cost=195.19..213.73 rows=1 width=8) (actual time=9.600..9.600 rows=0 loops=128613)
Group Key: qo.id
Filter: ((count(1) = 1) AND (array_agg(lower((propertyvalue_1.stringvalue)::text)) @> '{inkognito}'::text[]))
Rows Removed by Filter: 0
InitPlan 1 (returns $0)
-> Index Scan using idx_propertyvalue_propertytypeid_languageid_stringvalue on propertyvalue (cost=0.81..188.51 rows=182 width=4) (actual time=0.029..43.752 rows=45356 loops=1)
Index Cond: ((propertytype_id = 41) AND (language_id = 1) AND (lower((stringvalue)::text) = ANY ('{inkognito}'::text[])))
-> Nested Loop (cost=6.69..25.21 rows=1 width=14) (actual time=9.423..9.597 rows=0 loops=128613)
-> Nested Loop (cost=6.13..8.44 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=128613)
Join Filter: (qo.type_id = qotp.type_id)
Rows Removed by Join Filter: 8
-> Index Scan using pk_object_id on object qo (cost=0.42..2.44 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=128613)
Index Cond: (id = testsearch.object_id)
-> HashAggregate (cost=5.71..5.80 rows=9 width=4) (actual time=0.000..0.001 rows=9 loops=128613)
Group Key: qotp.type_id
-> Nested Loop (cost=3.35..5.68 rows=9 width=4) (actual time=0.017..0.028 rows=9 loops=1)
-> Hash Join (cost=3.20..4.63 rows=3 width=8) (actual time=0.015..0.021 rows=3 loops=1)
Hash Cond: (qp.id = qpp.propertyset_id)
-> Seq Scan on propertyset qp (cost=0.00..1.25 rows=24 width=4) (actual time=0.002..0.005 rows=24 loops=1)
Filter: (NOT deleted)
Rows Removed by Filter: 1
-> Hash (cost=3.17..3.17 rows=3 width=4) (actual time=0.006..0.006 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using fki_propertyset_propertytype_propertytype_id on propertyset_propertytype qpp (cost=0.14..3.17 rows=3 width=4) (actual time=0.003..0.005 rows=3 loops=1)
Index Cond: (propertytype_id = 41)
-> Index Scan using fki_object_type_propertyset_propertyset_id on object_type_propertyset qotp (cost=0.14..0.32 rows=3 width=8) (actual time=0.001..0.002 rows=3 loops=3)
Index Cond: (propertyset_id = qp.id)
-> Index Scan using pk_propertyvalue on propertyvalue propertyvalue_1 (cost=0.56..16.76 rows=1 width=14) (actual time=1.269..1.444 rows=0 loops=128613)
Index Cond: ((object_id = ANY ($0)) AND (object_id = testsearch.object_id) AND (propertytype_id = 41) AND (language_id = 1))
我通常求助于CTE,以解决我对Postgresql不够了解的情况,而下面的工作就像我假设上面的陈述那样。
WITH relevant_object_ids AS (
SELECT o."id"
FROM "object_table" o
INNER JOIN "propertyvalue_table" pt
ON o."id" = pt."object_id"
AND pt."propertytype_id" = 41
AND pt."language_id" = 1
GROUP BY o."id"
HAVING count(1) = <numberOfFilterValues>
AND array_agg(LOWER(pt."stringvalue")) @> string_to_array(lower('<filtervalue(s)>'), ',')
)
DELETE FROM "temporary_search_table"
WHERE NOT EXISTS (
SELECT 1
FROM relevant_object_ids
WHERE temporary_search_table."id = relevant_object_ids."id"
)