我的表lead
有一个索引:
\d lead
...
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
...
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
PG(9.1)为什么不使用索引进行这种直接的平等选择?电子邮件几乎都是独一无二的......
db=> explain select * from lead where email = 'blah';
QUERY PLAN
------------------------------------------------------------
Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108)
Filter: (email = 'blah'::text)
(2 rows)
其他索引查询似乎没问题(虽然我不知道为什么这个不只是使用pkey索引):
db=> explain select * from lead where id = '';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using lead_id_prefix on lead (cost=0.00..8.57 rows=1 width=5108)
Index Cond: (id = ''::text)
(2 rows)
db=> explain select * from lead where account__c = '';
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using lead_account__c on lead (cost=0.00..201.05 rows=49 width=5108)
Index Cond: (account__c = ''::text)
(2 rows)
起初我认为这可能是因为email
没有足够的明确值。例如,如果统计信息声称表格的大部分时间email
为blah
,则seq扫描速度更快。但事实并非如此:
db=> select count(*), count(distinct email) from lead;
count | count
--------+--------
749148 | 733416
(1 row)
即使我强行关闭seq扫描,计划程序的行为也好像没有其他选择:
db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
enable_seqscan
----------------
off
(1 row)
db=> explain select * from lead where email = 'foo@blah.com';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319599.38 rows=1 width=5108)
Filter: (email = 'foo@blah.com'::text)
(2 rows)
还尝试EXPLAIN ANALYZE
:
db=> explain analyze select * from lead where email = 'foo@blah.com';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
Filter: (email = 'foo@blah.com'::text)
Total runtime: 77857.215 ms
(3 rows)
以下是\d
输出(抱歉,必须隐藏列名称,并裁剪以符合SO的限制;请参阅http://pastebin.com/ve3gzJpY处的未加载版本):
Table "lead"
Column | Type | Modifiers
--------------------------------------------+-----------------------------+-----------
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
email | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
account__c | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
id | text | not null
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
以下是pg_dump --schema-only -t lead
(请再次参见http://pastebin.com/ve3gzJpY处未查看的内容,并提供唯一的列名称,以防有助于重现性):
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace:
--
CREATE TABLE lead (
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
account__c text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
id text NOT NULL,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);
ALTER TABLE lead OWNER TO pod;
--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace:
--
ALTER TABLE ONLY lead
ADD CONSTRAINT lead_pkey PRIMARY KEY (id);
--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_account__c ON lead USING btree (account__c);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_email ON lead USING btree (email);
--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);
--
-- PostgreSQL database dump complete
--
一些PG目录咒语:
db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
indexrelid | indrelid | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
215251995 | 101034456 | 1 | f | f | f | t | f | t | t | t | 101 | 100 | 10043 | 0 | ¤ | ¤
(1 row)
一些区域设置信息:
db=> show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
db=> show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
我搜索了大量过去的SO问题,但没有一个关于像这样的简单相等查询。
答案 0 :(得分:1)
要解决这些问题,必须在故障排除步骤之间运行VACUUM ANALYZE以查看哪些有效。否则你可能不知道到底改变了什么。所以试试并再次运行,看看是否能解决问题。
下一步运行(运行真空分析和每个之间的测试用例)是:
ALTER TABLE lead ALTER COLUMN email SET STATISTICS 1000;
也许这会解决它。也许不是。
如果这不能解决问题,请仔细查看pg_stat视图:
SELECT * FROM pg_stat WHERE table_name = 'lead';
请仔细阅读以下内容,并在pg_stat中查看您所看到的错误;
http://www.postgresql.org/docs/9.0/static/planner-stats.html
编辑:非常清楚,vacuum analyse
不是整个故障排除。但是,它必须在故障排除步骤之间运行,否则您无法确定规划人员是否考虑了正确的数据。
答案 1 :(得分:0)
CREATE INDEX lead_id_prefix ON使用btree(id text_pattern_ops);
text_pattern_ops的使用在这里看起来很奇怪。如果您的ID是某种整数,我会尝试将此索引作为测试。 (我会毫不犹豫地将这个索引放在开发服务器上。)由于你在“lead.id”上有另一个btree索引,我希望删除这个索引以便优化器使用关于“lead.id”的其他索引。
如果证明这是真的,那么我会尝试深入研究原因。