我在列events
上的derived_tstamp
表创建了索引,其中有超过400万条记录:
CREATE INDEX derived_tstamp_date_index ON atomic.events ( date(derived_tstamp) );
当我使用domain_userid
的两个不同值运行查询时,我得到了不同的Explain results
。在Query 1
中,它使用了索引但Query 2
没有使用索引。如何确保始终使用索引以获得更快的结果?
查询1:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'
GROUP BY 1
) v;
解释查询1
Aggregate (cost=1834.00..1834.01 rows=1 width=16) (actual time=138.619..138.619 rows=1 loops=1)
-> GroupAggregate (cost=1830.83..1832.93 rows=85 width=34) (actual time=137.096..138.563 rows=186 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=1830.83..1831.09 rows=104 width=10) (actual time=137.063..137.681 rows=2726 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 224kB
-> Bitmap Heap Scan on events (cost=1412.95..1827.35 rows=104 width=10) (actual time=108.764..136.053 rows=2726 loops=1)
Recheck Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date) AND ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text))
Rows Removed by Index Recheck: 19704
Heap Blocks: exact=466 lossy=3331
-> BitmapAnd (cost=1412.95..1412.95 rows=104 width=0) (actual time=108.474..108.474 rows=0 loops=1)
-> Bitmap Index Scan on derived_tstamp_date_index (cost=0.00..448.34 rows=21191 width=0) (actual time=94.371..94.371 rows=818461 loops=1)
Index Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
-> Bitmap Index Scan on events_domain_userid_index (cost=0.00..964.31 rows=20767 width=0) (actual time=3.044..3.044 rows=16834 loops=1)
Index Cond: ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text)
Planning time: 0.166 ms
查询2:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'
GROUP BY 1
) v;
解释查询2:
Aggregate (cost=226.12..226.13 rows=1 width=16) (actual time=0.402..0.402 rows=1 loops=1)
-> GroupAggregate (cost=226.08..226.10 rows=1 width=34) (actual time=0.394..0.397 rows=2 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=226.08..226.08 rows=1 width=10) (actual time=0.381..0.386 rows=13 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 25kB
-> Index Scan using events_domain_userid_index on events (cost=0.56..226.07 rows=1 width=10) (actual time=0.030..0.368 rows=13 loops=1)
Index Cond: ((domain_userid)::text = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'::text)
Filter: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
Rows Removed by Filter: 184
Planning time: 0.162 ms
Execution time: 0.440 ms
答案 0 :(得分:1)
在第二种情况下不使用索引,因为与条件domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'
(仅197)匹配的行很少,过滤这些行比使用新索引执行位图索引扫描更便宜。