我有一个不太复杂的(imho)过滤逻辑,它基于Django模型中的几种条件。有一个特定的查询需要很长时间才能完成。
查询是基于这两个查询集构建的:
extension Array where Element == FuncT { ... }
和
queryset = self.serializer_class.Meta.model.valid_pricelist_objects.filter(
Q(drug_prices__pricelist__price_destination__to_all_insurances=True
) |
# pylint: disable=line-too-long
Q(
drug_prices__pricelist__price_destination__to_organization_data__organization__uuid
=self.kwargs.get('i_uuid')))
return queryset
第二个查询集包装第一个查询集(通过return super().get_queryset().filter(
Q(active=True),
Q(drug_prices__pricelist__active=True), # Lista de precios activa
# Q(drug_pictures__is_main=True), # Que tenga una imagen
# TODO: Hacer filtros por pais PriceListCountries
Q(
Q(drug_prices__pricelist__expires=False) | # Que tenga precios que no caducan o
Q(
Q(drug_prices__pricelist__expires=True), # Que tenga precios que si caducan Y
Q(drug_prices__pricelist__datestart__date__lte=timezone.now()), # Fecha de inicio menor que hoy Y
Q(drug_prices__pricelist__dateend__date__gte=timezone.now()) # Fecha final mayor que hoy
)
)
).distinct().prefetch_related(
'categories__contexts',
'brands',
'drug_prices__pricelist',
'drug_pictures',
'drug_prices__pricelist__price_destination',
)
调用)。结果查询如下:
super()
使用SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty_id"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "monetary_pricelist" ON ( "monetary_drugprice"."pricelist_id" = "monetary_pricelist"."id" )
INNER JOIN "monetary_drugprice" T4 ON ( "phdrug_phdrug"."id" = T4."drug_id" )
INNER JOIN "monetary_pricelist" T5 ON ( T4."pricelist_id" = T5."id" )
INNER JOIN "monetary_pricelistdestinations" ON ( T5."id" = "monetary_pricelistdestinations"."pricelist_id" )
LEFT OUTER JOIN "organization_organizationdata" ON ( "monetary_pricelistdestinations"."to_organization_data_id" = "organization_organizationdata"."id" )
LEFT OUTER JOIN "organization_organization" ON ( "organization_organizationdata"."organization_id" = "organization_organization"."id" )
WHERE
(
"phdrug_phdrug"."active" = TRUE
AND "monetary_pricelist"."active" = TRUE
AND (
"monetary_pricelist"."expires" = FALSE
OR (
"monetary_pricelist"."expires" = TRUE
AND ( "monetary_pricelist"."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2019-01-22'
AND ( "monetary_pricelist"."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2019-01-22'
))
AND (
"monetary_pricelistdestinations"."to_all_insurances" = TRUE
OR "organization_organization"."uuid" = 'b51773d4-05f8-43a2-86ef-0098b31725d8'
))
ORDER BY
"phdrug_phdrug"."default_description" ASC
运行查询,我得到了:
EXPLAIN ANALYZE
如果将其粘贴到Unique (cost=10412.31..12666.32 rows=29084 width=143) (actual time=3373.496..3620.090 rows=6442 loops=1)
-> Sort (cost=10412.31..10485.02 rows=29084 width=143) (actual time=3373.494..3460.790 rows=228667 loops=1)
Sort Key: phdrug_phdrug.default_description, phdrug_phdrug.id, phdrug_phdrug.uuid, phdrug_phdrug.ean, phdrug_phdrug.parent_ean, phdrug_phdrug.reg_num, phdrug_phdrug.medika_code, phdrug_phdrug.atc_iv, phdrug_phdrug.product_type, phdrug_phdrug.fraction, phdrug_phdrug.active, phdrug_phdrug.loyal, phdrug_phdrug.patent, phdrug_phdrug.chronics, phdrug_phdrug.recipe, phdrug_phdrug.deal, phdrug_phdrug.specialized, phdrug_phdrug.armored, phdrug_phdrug.hight_speciality, phdrug_phdrug.temp_8_15, phdrug_phdrug.temp_15_25, phdrug_phdrug.temp_2_8, phdrug_phdrug.temp_less_15, phdrug_phdrug.new, phdrug_phdrug.mdk_internal_code, phdrug_phdrug.mdk_single_id, phdrug_phdrug.is_from_mdk_db, phdrug_phdrug.top, phdrug_phdrug.laboratory_id, phdrug_phdrug.specialty_id
Sort Method: external merge Disk: 31192kB
-> Hash Join (cost=704.51..6166.54 rows=29084 width=143) (actual time=23.648..507.099 rows=228667 loops=1)
Hash Cond: (monetary_drugprice.pricelist_id = monetary_pricelist.id)
-> Nested Loop (cost=696.92..5604.95 rows=44105 width=147) (actual time=22.881..416.630 rows=457692 loops=1)
Join Filter: (phdrug_phdrug.id = monetary_drugprice.drug_id)
-> Hash Join (cost=696.51..1177.21 rows=4583 width=147) (actual time=22.864..38.841 rows=23577 loops=1)
Hash Cond: (phdrug_phdrug.id = t4.drug_id)
-> Seq Scan on phdrug_phdrug (cost=0.00..359.94 rows=11992 width=143) (actual time=0.438..3.593 rows=11992 loops=1)
Filter: active
Rows Removed by Filter: 2
-> Hash (cost=639.21..639.21 rows=4584 width=4) (actual time=22.339..22.339 rows=23577 loops=1)
Buckets: 32768 (originally 8192) Batches: 1 (originally 1) Memory Usage: 1085kB
-> Nested Loop (cost=3.99..639.21 rows=4584 width=4) (actual time=1.785..16.702 rows=23577 loops=1)
-> Nested Loop (cost=3.58..9.11 rows=5 width=8) (actual time=1.756..1.874 rows=7 loops=1)
-> Hash Left Join (cost=3.43..7.57 rows=5 width=4) (actual time=1.733..1.797 rows=7 loops=1)
Hash Cond: (monetary_pricelistdestinations.to_organization_data_id = organization_organizationdata.id)
Filter: (monetary_pricelistdestinations.to_all_insurances OR (organization_organization.uuid = 'b51773d4-05f8-43a2-86ef-0098b31725d8'::uuid))
Rows Removed by Filter: 130
-> Seq Scan on monetary_pricelistdestinations (cost=0.00..3.37 rows=137 width=9) (actual time=0.626..0.643 rows=137 loops=1)
-> Hash (cost=3.12..3.12 rows=25 width=20) (actual time=1.076..1.076 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Hash Left Join (cost=1.56..3.12 rows=25 width=20) (actual time=1.040..1.053 rows=25 loops=1)
Hash Cond: (organization_organizationdata.organization_id = organization_organization.id)
-> Seq Scan on organization_organizationdata (cost=0.00..1.25 rows=25 width=8) (actual time=0.501..0.504 rows=25 loops=1)
-> Hash (cost=1.25..1.25 rows=25 width=20) (actual time=0.513..0.513 rows=25 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on organization_organization (cost=0.00..1.25 rows=25 width=20) (actual time=0.484..0.501 rows=25 loops=1)
-> Index Only Scan using monetary_pricelist_pkey on monetary_pricelist t5 (cost=0.14..0.31 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=7)
Index Cond: (id = monetary_pricelistdestinations.pricelist_id)
Heap Fetches: 7
-> Index Scan using monetary_drugprice_pricelist_id_1ce160ce on monetary_drugprice t4 (cost=0.42..110.21 rows=1581 width=8) (actual time=0.010..1.236 rows=3368 loops=7)
Index Cond: (pricelist_id = t5.id)
-> Index Scan using monetary_drugprice_drug_id_c2f278e5 on monetary_drugprice (cost=0.42..0.78 rows=15 width=8) (actual time=0.002..0.009 rows=19 loops=23577)
Index Cond: (drug_id = t4.drug_id)
-> Hash (cost=6.45..6.45 rows=91 width=4) (actual time=0.757..0.757 rows=93 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on monetary_pricelist (cost=0.00..6.45 rows=91 width=4) (actual time=0.655..0.713 rows=93 loops=1)
Filter: (active AND ((NOT expires) OR (expires AND ((timezone('UTC'::text, datestart))::date <= '2019-01-22'::date) AND ((timezone('UTC'::text, dateend))::date >= '2019-01-22'::date))))
Rows Removed by Filter: 45
Planning time: 25.871 ms
Execution time: 3638.544 ms
中,则会得到this result
查询所花费的几乎全部时间都花在了排序上,如您在此处看到的:
...这就是我完全迷失的地方。什么排序?不是查询末尾的explain.depesz.com
,我已经尝试删除它。
如何提高此查询的性能?
编辑:我将在删除ORDER BY
之后添加EXPLAIN ANALYZE
。该查询需要更长的时间才能一致地完成。这就给问题增加了另一层ORDER BY
。
wtf
答案 0 :(得分:1)
您对EXPLAIN ANALYZE
结果进行排序的原因是,在查询中您使用的是SELECT DISTINCT
,因此数据库对所有列的行进行排序,并删除重复项。
查看您的查询,也许您不需要对sql表达式中的所有列进行区分。
尝试将.distinct()
替换为.distinct('pk')
请参阅Django distinct()
文档
仅在PostgreSQL上,可以传递位置参数(*字段)以指定DISTINCT应当应用于的字段的名称。这将转换为SELECT DISTINCT ON SQL查询。这是区别。对于正常的distinct()调用,数据库在确定哪些行不同时会比较每行中的每个字段。 对于具有指定字段名称的distinct()调用,数据库将仅比较指定的字段名称。
您的查询计划表明行排序发生在磁盘而不是内存中。
Sort Method: external merge Disk: 31160kB
您可以尝试使用another answer
中的解决方案此外,您真的需要一批中的所有数据吗?您是否尝试添加分页?
我已经看到人们试图将他们的查询从DISTINCT
转换为GROUP BY
,但是我不确定您是否可以这样做。