对同一查询的不同解释

时间:2017-12-11 13:36:51

标签: sql database postgresql indexing query-optimization

我在列events上的derived_tstamp表创建了索引,其中有超过400万条记录:

CREATE INDEX derived_tstamp_date_index ON atomic.events ( date(derived_tstamp) );

当我使用domain_userid的两个不同值运行查询时,我得到了不同的Explain results。在Query 1中,它使用了索引但Query 2没有使用索引。如何确保始终使用索引以获得更快的结果?

查询1:

EXPLAIN ANALYZE  SELECT
SUM(duration) as "total_time_spent"
FROM (
    SELECT
    domain_sessionidx,
    MIN(derived_tstamp) as "start_time",
    MAX(derived_tstamp) as "finish_time",
    MAX(derived_tstamp) - min(derived_tstamp) as "duration"
    FROM "atomic".events
    WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'
    GROUP BY 1
) v;

解释查询1

Aggregate  (cost=1834.00..1834.01 rows=1 width=16) (actual time=138.619..138.619 rows=1 loops=1)
  ->  GroupAggregate  (cost=1830.83..1832.93 rows=85 width=34) (actual time=137.096..138.563 rows=186 loops=1)
        Group Key: events.domain_sessionidx
        ->  Sort  (cost=1830.83..1831.09 rows=104 width=10) (actual time=137.063..137.681 rows=2726 loops=1)
              Sort Key: events.domain_sessionidx
              Sort Method: quicksort  Memory: 224kB
              ->  Bitmap Heap Scan on events  (cost=1412.95..1827.35 rows=104 width=10) (actual time=108.764..136.053 rows=2726 loops=1)
                    Recheck Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date) AND ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text))
                    Rows Removed by Index Recheck: 19704
                    Heap Blocks: exact=466 lossy=3331
                    ->  BitmapAnd  (cost=1412.95..1412.95 rows=104 width=0) (actual time=108.474..108.474 rows=0 loops=1)
                          ->  Bitmap Index Scan on derived_tstamp_date_index  (cost=0.00..448.34 rows=21191 width=0) (actual time=94.371..94.371 rows=818461 loops=1)
                                Index Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
                          ->  Bitmap Index Scan on events_domain_userid_index  (cost=0.00..964.31 rows=20767 width=0) (actual time=3.044..3.044 rows=16834 loops=1)
                                Index Cond: ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text)
Planning time: 0.166 ms

查询2:

EXPLAIN ANALYZE  SELECT
SUM(duration) as "total_time_spent"
FROM (
    SELECT
    domain_sessionidx,
    MIN(derived_tstamp) as "start_time",
    MAX(derived_tstamp) as "finish_time",
    MAX(derived_tstamp) - min(derived_tstamp) as "duration"
    FROM "atomic".events
    WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'
    GROUP BY 1
) v;

解释查询2:

Aggregate  (cost=226.12..226.13 rows=1 width=16) (actual time=0.402..0.402 rows=1 loops=1)
  ->  GroupAggregate  (cost=226.08..226.10 rows=1 width=34) (actual time=0.394..0.397 rows=2 loops=1)
        Group Key: events.domain_sessionidx
        ->  Sort  (cost=226.08..226.08 rows=1 width=10) (actual time=0.381..0.386 rows=13 loops=1)
              Sort Key: events.domain_sessionidx
              Sort Method: quicksort  Memory: 25kB
              ->  Index Scan using events_domain_userid_index on events  (cost=0.56..226.07 rows=1 width=10) (actual time=0.030..0.368 rows=13 loops=1)
                    Index Cond: ((domain_userid)::text = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'::text)
                    Filter: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
                    Rows Removed by Filter: 184
Planning time: 0.162 ms
Execution time: 0.440 ms

1 个答案:

答案 0 :(得分:1)

在第二种情况下不使用索引,因为与条件domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'(仅197)匹配的行很少,过滤这些行比使用新索引执行位图索引扫描更便宜。