顺序扫描而不是索引扫描

时间:2021-05-23 20:31:16

标签: sql django postgresql

我不明白为什么这个查询在我的生产数据库上异常缓慢。我看到两次连续扫描 - 第一次来自 dockets_table.content_type_id = 2 上的过滤器,第二次我什至无法弄清楚它与什么有关。此查询在生产机器上需要几分钟,最多需要几秒钟。

查询:

SELECT COUNT(*) 
  FROM (
        SELECT DISTINCT 
            ON (rank, "dockets_document"."id") "dockets_document"."id" AS Col1, 
               "dockets_document"."title" AS Col2, 
               "dockets_document"."url" AS Col3, 
               "dockets_document"."content" AS Col4, 
               "dockets_document"."num_tables" AS Col5, 
               "dockets_document"."num_pages" AS Col6, 
               "dockets_document"."tables_processed" AS Col7, 
               "dockets_document"."summary" AS Col8, 
               "dockets_document"."importance" AS Col9, 
               "dockets_document"."title_vector" AS Col10, 
               "dockets_document"."search_vector" AS Col11, 
               "dockets_document"."datetime_created" AS Col12, 
               "dockets_document"."datetime_modified" AS Col13, 
               "dockets_document"."docket_id" AS Col14, 
               "dockets_document"."docket_number" AS Col15, 
               "dockets_document"."date_filed" AS Col16, 
               "dockets_document"."pdf" AS Col17, 
               ((1 * ts_rank("dockets_document"."search_vector", plainto_tsquery('public.bb'::regconfig, 'knighthead'))) + (0.8 * ts_rank("dockets_attachment"."search_vector", plainto_tsquery('public.bb'::regconfig, 'knighthead')))) AS "rank", 
               ts_headline("dockets_document"."title", plainto_tsquery('public.bb'::regconfig, 'knighthead'), 'StartSel=''<mark>'', StopSel=''</mark>'', HighlightAll=true') AS "title_snippet", 
               ts_headline(CONCAT(array_to_string("dockets_document"."content", ' ', ''), CONCAT(array_to_string("dockets_attachment"."content", ' ', ''), CONCAT(array_to_string("dockets_table"."content", ' ', ''), array_to_string(T5."content", ' ', '')))), plainto_tsquery('public.bb'::regconfig, 'knighthead'), 'StartSel=''<mark>'', StopSel=''</mark>'', MaxFragments=5') AS "content_snippet" 
          FROM "dockets_document" 
          LEFT OUTER JOIN "dockets_attachment" 
            ON ("dockets_document"."id" = "dockets_attachment"."main_document_id") 
          LEFT OUTER JOIN "dockets_table" 
            ON ("dockets_document"."id" = "dockets_table"."object_id" AND ("dockets_table"."content_type_id" = 8)) 
          LEFT OUTER JOIN "dockets_table" T5 
            ON ("dockets_attachment"."id" = T5."object_id" AND (T5."content_type_id" = 24)) 
         WHERE ("dockets_document"."docket_id" = 'p_hertz' AND NOT ("dockets_document"."title_vector" @@ ((((((plainto_tsquery('pro hac') || plainto_tsquery('certificate of mailing')) || plainto_tsquery('request for service')) || plainto_tsquery('certification of counsel')) || plainto_tsquery('receipt of filing fee')) || plainto_tsquery('affidavit of service')) || plainto_tsquery('Transfer/Assignment of Claim')) AND "dockets_document"."title_vector" IS NOT NULL) AND ("dockets_document"."search_vector" @@ plainto_tsquery('public.bb'::regconfig, 'knighthead') OR "dockets_attachment"."search_vector" @@ plainto_tsquery('public.bb'::regconfig, 'knighthead') OR "dockets_table"."search_vector" @@ plainto_tsquery('public.bb'::regconfig, 'knighthead') OR T5."search_vector" @@ plainto_tsquery('public.bb'::regconfig, 'knighthead'))) 
         ORDER BY "rank" DESC, "dockets_document"."id" ASC
       ) subquery

解释分析输出:

Aggregate  (cost=140427.33..140427.34 rows=1 width=8) (actual time=1758.440..1758.635 rows=1 loops=1)
  ->  Unique  (cost=140425.97..140426.48 rows=68 width=1041) (actual time=1758.122..1758.625 rows=103 loops=1)
        ->  Sort  (cost=140425.97..140426.14 rows=68 width=1041) (actual time=1758.120..1758.397 rows=1734 loops=1)
              Sort Key: ((('1'::double precision * ts_rank(dockets_document.search_vector, '''knighthead'''::tsquery)) + ('0.8'::double precision * ts_rank(dockets_attachment.search_vector, '''knighthead'''::tsquery)))) DESC, dockets_document.id
              Sort Method: quicksort  Memory: 184kB
              ->  Hash Right Join  (cost=137938.16..140423.90 rows=68 width=1041) (actual time=221.737..1756.835 rows=1734 loops=1)
                    Hash Cond: ((t5.object_id)::text = (dockets_attachment.id)::text)
                    Filter: ((dockets_document.search_vector @@ '''knighthead'''::tsquery) OR (dockets_attachment.search_vector @@ '''knighthead'''::tsquery) OR (dockets_table.search_vector @@ '''knighthead'''::tsquery) OR (t5.search_vector @@ '''knighthead'''::tsquery))
                    Rows Removed by Filter: 45943
                    ->  Index Scan using dockets_table_content_type_id_c1e999b5 on dockets_table t5  (cost=0.43..2414.76 rows=16933 width=46) (actual time=0.040..8.865 rows=20302 loops=1)
                          Index Cond: (content_type_id = 24)
                    ->  Hash  (cost=137852.10..137852.10 rows=6851 width=306) (actual time=214.841..215.034 rows=16360 loops=1)
                          Buckets: 16384 (originally 8192)  Batches: 2 (originally 1)  Memory Usage: 3969kB
                          ->  Gather  (cost=1243.24..137852.10 rows=6851 width=306) (actual time=2.297..201.903 rows=16360 loops=1)
                                Workers Planned: 1
                                Workers Launched: 1
                                ->  Nested Loop Left Join  (cost=243.24..136167.00 rows=4030 width=306) (actual time=1.154..195.086 rows=8180 loops=2)
                                      ->  Nested Loop Left Join  (cost=242.81..43560.75 rows=1667 width=274) (actual time=1.134..147.731 rows=2794 loops=2)
                                            ->  Parallel Bitmap Heap Scan on dockets_document  (cost=242.39..13706.77 rows=1667 width=116) (actual time=1.113..113.645 rows=2401 loops=2)
                                                  Recheck Cond: ((docket_id)::text = 'p_hertz'::text)
                                                  Filter: ((NOT (title_vector @@ ((((((plainto_tsquery('pro hac'::text) || plainto_tsquery('certificate of mailing'::text)) || plainto_tsquery('request for service'::text)) || plainto_tsquery('certification of counsel'::text)) || plainto_tsquery('receipt of filing fee'::text)) || plainto_tsquery('affidavit of service'::text)) || plainto_tsquery('Transfer/Assignment of Claim'::text)))) OR (title_vector IS NULL))
                                                  Heap Blocks: exact=2244
                                                  ->  Bitmap Index Scan on dockets_document_company_id_6b3b6f6d  (cost=0.00..241.68 rows=2834 width=0) (actual time=1.105..1.105 rows=4943 loops=1)
                                                        Index Cond: ((docket_id)::text = 'p_hertz'::text)
                                            ->  Index Scan using dockets_attachment_main_document_id_1f6c050c on dockets_attachment  (cost=0.42..17.87 rows=4 width=171) (actual time=0.013..0.013 rows=0 loops=4802)
                                                  Index Cond: ((dockets_document.id)::text = (main_document_id)::text)
                                      ->  Index Scan using dockets_table_object_id_da06f22d on dockets_table  (cost=0.43..54.83 rows=72 width=46) (actual time=0.014..0.015 rows=2 loops=5587)
                                            Index Cond: ((dockets_document.id)::text = (object_id)::text)
                                            Filter: (content_type_id = 8)
Planning time: 3.283 ms
Execution time: 1759.062 ms

如果有帮助,这里是构建查询的 Django ORM:

    def search(self, search_term):
        def myStringAgg(field: str):
            return Func(
                F(field),
                Value(" "),
                Value(""),
                function="array_to_string",
                output_field=models.TextField(),
            )

        query = SearchQuery(search_term, config="public.bb")

        rank = (
            1*SearchRank(F("search_vector"), query)
            + 0.8*SearchRank(F("attachments__search_vector"), query)
        )

                qs = (
            self.filter(
                Q(search_vector=query)
               | Q(attachments__search_vector=query)
               | Q(tables__search_vector=query)
               | Q(attachments__tables__search_vector=query)
            )
            .annotate(rank=rank)
            .order_by("-rank", "pk")
            .annotate(
                title_snippet=SearchHeadline(
                    "title",
                    query,
                    highlight_all=True,
                    start_sel="<mark>",
                    stop_sel="</mark>",
                )
            )
            .annotate(
                content_snippet=SearchHeadline(
                    Concat(
                         myStringAgg("content"),
                         myStringAgg("attachments__content"),
                         myStringAgg("tables__content"),
                         myStringAgg("attachments__tables__content"),
                    ),
                    query,
                    max_fragments=5,
                    start_sel="<mark>",
                    stop_sel="</mark>",
                )
            )
            .distinct("rank", "pk")
        )


        return qs

1 个答案:

答案 0 :(得分:1)

与其在重要的搜索条件中包含 OR,不如将查询重写为多个查询的 UNION,每个查询都没有 OR

相关问题