Question

这是有问题的表及其索引。这个生产中的表包含大约5000万行。

    CREATE TABLE "public"."audit_page_loads" (
    "id" int4 NOT NULL DEFAULT nextval('audit_page_loads_id_seq'::regclass),
    "dt" timestamp(6) NULL,
    "ip" varchar(255) COLLATE "default",
    "method" varchar(255) COLLATE "default",
    "action" varchar(255) COLLATE "default",
    "elapsed" numeric(8,2) DEFAULT 0,
    "views" numeric(8,2) DEFAULT 0,
    "db" numeric(8,2) DEFAULT 0
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."audit_page_loads" ADD PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE;

CREATE INDEX  "index_audit_page_loads_on_action" ON "public"."audit_page_loads" USING btree("action" COLLATE "default" ASC NULLS LAST);
CREATE INDEX  "index_audit_page_loads_on_action_and_dt" ON "public"."audit_page_loads" USING btree("action" COLLATE "default" ASC NULLS LAST, dt ASC NULLS LAST);

当我对表运行此选择时，需要将近一分钟才能返回结果。我假设因为我有一个关于action和dt的复合索引，这将导致我的select进行索引扫描。不是这样。每个查询都是序列扫描

SELECT action, avg(elapsed), count(*) 
FROM audit_page_loads 
WHERE action != 'UsersController#login' and dt >= '2014-09-01'
GROUP BY action

Limit  (cost=1685321.43..1685321.68 rows=20 width=32) (actual time=15900.954..15900.968 rows=20 loops=1)
  ->  HashAggregate  (cost=1685321.43..1685321.80 rows=30 width=32) (actual time=15900.952..15900.965 rows=20 loops=1)
        ->  Seq Scan on audit_page_loads  (cost=0.00..1646329.70 rows=5198897 width=32) (actual time=7.075..11826.963 rows=5820401 loops=1)
              Filter: (((action)::text <> 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone))
              Rows Removed by Filter: 52614815
Total runtime: 15901.013 ms

当我将 set enable_seqscan = false; 添加到该选择时，结果是即时的。 .13秒我在解释中可以看到使用了“index_audit_page_loads_on_action_and_dt”索引。为什么我要强制这样做？如果我理解正确，包括 set enable_seqscan = false; 是不可取的。有人可以帮助我吗？

x86_64-unknown-linux-gnu上的

PostgreSQL 9.3.5，由gcc编译（Ubuntu / Linaro 4.6.3-1ubuntu5）4.6.3,64位

Answer 1

action上的索引无法在此使用，因为您正在使用!=列的action比较。如果您正在寻找其以外的值，则无法使用索引。

相反，您可以尝试以下方法：

CREATE INDEX i_1 ON audit_page_loads(dt);

如果经常使用谓词action != 'UsersController#login'，可以尝试创建部分索引：

CREATE INDEX i_1 ON audit_page_loads(dt) WHERE action != 'UsersController#login';

根据您计划中的Rows Removed by Filter: 52614815条目，此索引应该非常好。

由于您在查询的dt列中使用范围查找，因此无法优化GROUP BY使用索引，因此您可以使用索引。将拥有HashAgg节点。但是对于索引，现在时间应该好多了。

Answer 2

您的WHERE子句不是sargable。但是可能存在 sargable的等效表达式。

我构建了你的表，并加载了大约一百万行随机数据。

explain analyze
select action, avg(elapsed), count(*) 
from audit_page_loads 
where action < 'UsersController#login' and dt >= '2014-09-01'
   or action > 'UsersController#login' and dt >= '2014-09-01'
group by action;

"HashAggregate  (cost=75.45..75.46 rows=1 width=24) (actual time=0.379..0.379 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on audit_page_loads  (cost=9.20..75.32 rows=17 width=24) (actual time=0.264..0.276 rows=90 loops=1)"
"        Recheck Cond: ((((action)::text = '2014-09-01 00:00:00'::timestamp without time zone)) OR (((action)::text > 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone)))"
"        ->  BitmapOr  (cost=9.20..9.20 rows=17 width=0) (actual time=0.259..0.259 rows=0 loops=1)"
"              ->  Bitmap Index Scan on audit_page_loads_action_dt_idx  (cost=0.00..4.59 rows=8 width=0) (actual time=0.191..0.191 rows=90 loops=1)"
"                    Index Cond: (((action)::text = '2014-09-01 00:00:00'::timestamp without time zone))"
"              ->  Bitmap Index Scan on audit_page_loads_action_dt_idx  (cost=0.00..4.59 rows=8 width=0) (actual time=0.067..0.067 rows=0 loops=1)"
"                    Index Cond: (((action)::text > 'UsersController#login'::text) AND (dt >= '2014-09-01 00:00:00'::timestamp without time zone))"
"Total runtime: 0.431 ms"

在我的盒子上，这将运行时间从大约186毫秒减少到大约0.4毫秒。因人而异;我确定我的数据看起来不像你的。

在一般情况下，是否可以使用WHERE子句中的等效sargable谓词替换非可定义运算符取决于数据类型，归类顺序和区分大小写。使用您自己的数据进行测试。

PostgreSQL忽略了我的复合索引

2 个答案: