PostgreSQL:提高非重复计数时的性能

时间:2018-09-07 00:50:53

标签: postgresql performance count distinct

我目前正在努力改善数据库的性能。我需要您的帮助。 我有一个表格及其索引

CREATE TABLE public.ar
(
    id integer NOT NULL DEFAULT nextval('id_seq'::regclass),
    user_id integer NOT NULL,
    duration double precision,
    is_idle boolean NOT NULL,
    activity_id integer NOT NULL,
    device_id integer NOT NULL,
    calendar_id integer,
    on_taskness integer,
    week_id integer,
    some_other_column_below,
    CONSTRAINT id_ PRIMARY KEY (id),
    CONSTRAINT a_unique_key UNIQUE (user_id, device_id, start_time_local, start_time_utc, end_time_local, end_time_utc)
)

CREATE INDEX ar_idx
    ON public.ar USING btree
    (week_id, calendar_id, user_id, activity_id, duration, on_taskness, is_idle)
    TABLESPACE pg_default;

然后我正在尝试运行这样的查询

EXPLAIN ANALYZE
SELECT COUNT(*)
FROM (
SELECT ar.user_id
    FROM ar
    WHERE ar.user_id = ANY(array[some_data]) -- data size is 352
        AND ROUND(ar.duration) >0 AND ar.is_idle = false
        AND ar.week_id = ANY(ARRAY[some_data]) -- data size is 37
        AND ar.calendar_id = ANY(array[some_data]) -- data size is 16716
    GROUP by ar.user_id
) tmp;

下面是解释结果

Aggregate  (cost=31389954.72..31389954.73 rows=1 width=8) (actual time=252020.695..252020.695 rows=1 loops=1)
  ->  Group  (cost=31389032.69..31389922.37 rows=2588 width=4) (actual time=251089.270..252020.659 rows=351 loops=1)
        Group Key: ar.user_id
        ->  Sort  (cost=31389032.69..31389477.53 rows=177935 width=4) (actual time=251089.268..251776.202 rows=6993358 loops=1)
              Sort Key: ar.user_id
              Sort Method: external merge  Disk: 95672kB
              ->  Bitmap Heap Scan on ar  (cost=609015.18..31371079.88 rows=177935 width=4) (actual time=1670.413..248939.440 rows=6993358 loops=1)
                    Recheck Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
                    Rows Removed by Index Recheck: 2081028
                    Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (calendar_id = ANY ('{some_data}'::integer[])))
                    Rows Removed by Filter: 534017
                    Heap Blocks: exact=29551 lossy=313127
                    ->  BitmapAnd  (cost=609015.18..609015.18 rows=1357521 width=0) (actual time=1666.334..1666.334 rows=0 loops=1)
                          ->  Bitmap Index Scan on test_index_only_scan_idx  (cost=0.00..272396.77 rows=6970353 width=0) (actual time=614.366..614.366 rows=7269830 loops=1)
                                Index Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (is_idle = false))
                          ->  Bitmap Index Scan on unique_key  (cost=0.00..336529.20 rows=9948573 width=0) (actual time=1041.999..1041.999 rows=14959355 loops=1)
                                Index Cond: (user_id = ANY ('{some_data}'::integer[]))
Planning time: 25.563 ms
Execution time: 252029.237 ms

我也使用了distinct,结果是相同的。

所以我的问题在下面。

  1. ar_idx包含user_id,但是在搜索行时,为什么使用unique_key而不是我创建的索引?
  2. 我认为group by不会进行排序(这就是为什么我没有选择distinct),但是为什么在解释分析中会发生排序?
  3. 运行时间很长(超过4分钟)。我如何使其更快?索引错误吗?或我能做的其他事情。

请注意,ar包含51585203行

任何帮助将不胜感激。谢谢。

---------------------------更新------------------- ------- 创建该索引后,现在一切进展很快。我不明白为什么,任何人都可以向我解释一下?

CREATE INDEX ar_1_idx
    ON public.ar USING btree
    (calendar_id, user_id)
    TABLESPACE pg_default;

然后我将旧索引更改为

CREATE INDEX ar_idx
    ON public.ar USING btree
    (week_id, calendar, user_id, activity_id, duration, on_taskness, start_time_local, end_time_local)  WHERE is_idle IS FALSE
    TABLESPACE pg_default;

-----更新分析结果-----------

 Aggregate  (cost=31216435.97..31216435.98 rows=1 width=8) (actual time=13206.941..13206.941 rows=1 loops=1)
      Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
      ->  Group  (cost=31215436.80..31216403.88 rows=2567 width=4) (actual time=12239.336..13206.894 rows=351 loops=1)
            Group Key: ar.user_id
            Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
            ->  Sort  (cost=31215436.80..31215920.34 rows=193417 width=4) (actual time=12239.334..12932.801 rows=6993358 loops=1)
                  Sort Key: ar.user_id
                  Sort Method: external merge  Disk: 95664kB
                  Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
                  ->  Index Scan using ar_1_idx on activity_report ar  (cost=0.56..31195807.48 rows=193417 width=4) (actual time=0.275..10387.051 rows=6993358 loops=1)
                        Index Cond: ((calendar_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
                        Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (week_id = ANY ('{some_data}'::integer[])))
                        Rows Removed by Filter: 590705
                        Buffers: shared hit=25940518 read=430315
    Planning time: 25.577 ms
    Execution time: 13217.611 ms

0 个答案:

没有答案