Question

我继承了一个在django 1.5中运行的大型遗留代码库，我当前的任务是加快网站的一部分，加载 ~1min 。

我做了应用程序的个人资料并得到了这个：

罪魁祸首特别是以下查询（为简洁而剥离）：

SELECT COUNT(*) FROM "entities_entity" WHERE (
  "entities_entity"."date_filed" <= '2016-01-21' AND (
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR
    -- 34 more of these
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR
    -- 34 more of these
  )
)

基本上包含对两个字段，entity_city_state_zip和agent_city_state_zip这两个character varying(200) | not null字段的大型查询。

该查询执行两次（！），每次执行 18814.02ms ，再一次更换COUNT SELECT占用额外的20216.49（我要缓存COUNT的结果）

解释如下：

Aggregate  (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1)
  ->  Seq Scan on entities_entity  (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1)
        Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text) ))
        Rows Removed by Filter: 310249
Planning time: 2.110 ms
Execution time: 17841.944 ms

我尝试使用以下各种组合在entity_city_state_zip和agent_city_state_zip上使用索引：

CREATE INDEX ON entities_entity (upper(entity_city_state_zip));
CREATE INDEX ON entities_entity (upper(agent_city_state_zip));

或使用varchar_pattern_ops，没有运气。

服务器正在使用以下内容：

qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) |
                     Q(agent_city_state_zip__icontains = all_city_list))

生成该查询。

我不知道还有什么可以尝试，

谢谢！

Answer 1

我认为“多个LIKE ”和 UPPER（“entities_entity ...

>中的问题

您可以使用：

WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'

或类似的东西：

WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])

<强>被修改

关于Django中的Raw SQL查询：

<强>此致

Answer 2

我在Pluralsight看了一个针对非常类似问题的课程。该课程是针对.NET开发人员的#g; Postgres＆＃34;这是在＆＃34; Fun With Simple SQL＆＃34;，＆＃34; Full Text Search。＆＃34;

使用您的示例总结他们的解决方案：

在表格中创建一个新列，代表您的entity_city_state_zip作为tsvector：

create table entities_entity (
  date_filed date,
  entity_city_state_zip text,
  csz_search tsvector not null   -- add this column
);

最初你可能必须让它可以为空，然后填充数据并使其不可为空。

update entities_entity
set csz_search = to_tsvector (entity_city_state_zip);

接下来，创建一个触发器，在每次添加或修改记录时都会填充新字段：

create trigger entities_insert_update
before insert or update on entities_entity
for each row execute procedure
tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip);

您的搜索查询现在可以在tsvector字段而不是city / state / zip字段中查询：

select * from entities_entity
where csz_search @@ to_tsquery('Atherton')

对此感兴趣的一些注意事项：

to_tsquery，如果你还没有使用它，它比上面的例子更复杂。它允许和条件，部分匹配等
它也不区分大小写，因此不需要执行查询中的upper函数

最后一步，在tsquery字段上添加GIN索引：

create index entities_entity_ix1 on entities_entity
using gin(csz_search);

如果我理解正确的课程，这应该会使您的查询成为现实，并且它将克服btree索引无法处理like '%查询的问题。

以下是关于此类查询的解释计划：

Bitmap Heap Scan on entities_entity  (cost=56.16..1204.78 rows=505 width=81)
  Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text))
  ->  Bitmap Index Scan on entities_entity_ix1  (cost=0.00..56.04 rows=505 width=0)
        Index Cond: (csz_search @@ to_tsquery('Atherton'::text))

提高查询速度：使用LIKE进行简单的SELECT

2 个答案: