Question

我在PostgreSQL数据库中拥有近60亿条记录的表格。表的列email_id定义为

character varying(64)

我正在尝试通过此列优化搜索。例如，查询：

select count(1) from my_table where email_id = 'some@email.com';

需要约190秒才能完成并返回结果。我尝试在该列上创建索引，如：

CREATE INDEX my_table_idx_email_id
  ON my_table
  USING btree
  (email_id);

但根本没有明显改善甚至改善。我还尝试使用explain analyze语句分析查询，并确认问题出现在电子邮件列中。

可能改进案例的一种方法是将表客户和my_table整数外键用于客户。目前这是不可能或难以实现的，因为客户在不同的数据库中。我正在努力寻找其他可能性。
我可以更改email_id colum数据类型。可以改为更合适的速度加快查询速度吗？
我应该使用LIKE'电子邮件'来使用任何其他索引或某种全文搜索吗？

解释分析的示例输出：

explain analyze select count(1) from my_table where email_id = 'test@unknown.email';
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5211284.25..5211284.26 rows=1 width=0) (actual time=225424.749..225424.749 rows=1 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..5211235.72 rows=19410 width=0) (actual time=225424.744..225424.744 rows=0 loops=1)
         Filter: ((email_id)::text = 'test@unknown.email'::text)
 Total runtime: 225426.646 ms

解释enable_seqscan = off后的分析输出：

SET enable_seqscan = off;
explain analyze select count(1) from my_table where email_id = 'test@unknown.email';
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10005215244.40..10005215244.41 rows=1 width=0) (actual time=282110.404..282110.405 rows=1 loops=1)
   ->  Seq Scan on my_table  (cost=10000000000.00..10005215195.84 rows=19425 width=0) (actual time=282110.393..282110.393 rows=0 loops=1)
         Filter: ((email_id)::text = 'test@unknown.email'::text)
 Total runtime: 282113.296 ms

PostgreSQL通过电子邮件colum加速搜索

0 个答案: