聚合过滤器表达式不能使用索引吗?

时间:2018-01-12 20:35:03

标签: sql postgresql

关于过滤器表达式的一个很酷的事情是,您可以在一个查询中执行多个不同的过滤器和聚合。 “where”部分成为聚合的一部分而不是整个“where”子句。

例如:

SELECT count('id') FILTER (WHERE account_type=1) as regular,
       count('id') FILTER (WHERE account_type=2) as gold,
       count('id') FILTER (WHERE account_type=3) as platinum
FROM clients;

(来自the Django documentation

这是PostgreSQL 9.5中的一个错误,或者我做得不对,或者只是PostgreSQL的限制。

考虑以下两个问题:

select count(*)
from main_search
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null;

select
count('id') filter (
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null) as count
from main_search;

main_search表在created and parent_id is null上有一个组合的btree索引)

这是输出:

 count
-------
  9682
(1 row)

 count
-------
  9682
(1 row)

如果我在每个查询前面加上explain analyze,这就是输出:

    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1174.04..1174.05 rows=1 width=0) (actual time=5.077..5.077 rows=1 loops=1)
   ->  Index Scan using main_search_created_parent_id_null_idx on main_search  (cost=0.43..1152.69 rows=8540 width=0) (actual time=0.026..4.384 rows=9682 loops=1)
         Index Cond: ((created >= '2017-10-11 20:00:00.081739-04'::timestamp with time zone) AND (created < '2017-10-12 20:00:00.081739-04'::timestamp with time zone))
 Planning time: 0.826 ms
 Execution time: 5.227 ms
(5 rows)

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=178054.93..178054.94 rows=1 width=12) (actual time=1589.006..1589.007 rows=1 loops=1)
   ->  Seq Scan on main_search  (cost=0.00..146459.39 rows=4212739 width=12) (actual time=0.051..882.099 rows=4212818 loops=1)
 Planning time: 0.051 ms
 Execution time: 1589.070 ms
(4 rows)

注意!过滤器表达式SELECT语句始终使用秒扫描而不是索引扫描:&lt;

我在另一个数据库中使用另一个PostgreSQL 9.5表也尝试过这个。起初我认为“Seq Scan”的发生是因为该表的行数太少但两个表都足够大,以至于索引应该开始。

1 个答案:

答案 0 :(得分:0)

您误解了使用案例。 过滤器影响仅汇总在已生产的数据集上。 它不过滤记录。

考虑修改示例:

SELECT count(*) FILTER (WHERE account_type=1) as regular,
       count(*) FILTER (WHERE account_type=2) as gold,
       count(*) FILTER (WHERE account_type=3) as platinum,
       count(*) 
FROM clients;

那应该怎么样?

WHERE
(WHERE account_type=3)
or
(WHERE account_type=2)
or
(WHERE account_type=1)
or 1=1 ???

考虑更复杂的FILTER和未过滤列的组合。对于优化器来说,这将是一场噩梦。

当您想到FILTER时,请考虑这只是CASE等较长句子的快捷方式

SELECT SUM(CASE WHEN account_type=1 THEN 1 ELSE 0 END) as regular,
       SUM(CASE WHEN account_type=2 THEN 1 ELSE 0 END) as gold,
       SUM(CASE WHEN account_type=3 THEN 1 ELSE 0 END) as platinum
FROM clients;