Question

这是我不久前发布的this issue的后续活动。

我有以下代码：

SET work_mem = '16MB'; 
SELECT      s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset 
FROM        rm_o_resource_usage_instance_splits_new s 
INNER JOIN  rm_o_resource_usage r ON s.usage_id = r.id 
INNER JOIN  scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id 
WHERE       r.schedule_id = 10 
ORDER BY    r.resource_id, s.start_date

运行EXPLAIN (ANALYZE, BUFFERS)时，我得到以下信息：

Sort  (cost=3724.02..3724.29 rows=105 width=89) (actual time=245.802..247.573 rows=22302 loops=1)
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6692kB
  Buffers: shared hit=198702 read=5993 written=612
  ->  Nested Loop  (cost=703.76..3720.50 rows=105 width=89) (actual time=1.898..164.741 rows=22302 loops=1)
        Buffers: shared hit=198702 read=5993 written=612
        ->  Hash Join  (cost=703.34..3558.54 rows=105 width=101) (actual time=1.815..11.259 rows=22302 loops=1)
              Hash Cond: (s.usage_id = r.id)
              Buffers: shared hit=3 read=397 written=2
              ->  Bitmap Heap Scan on rm_o_resource_usage_instance_splits_new s  (cost=690.61..3486.58 rows=22477 width=69) (actual time=1.782..5.820 rows=22302 loops=1)
                    Recheck Cond: (solution = 10)
                    Heap Blocks: exact=319
                    Buffers: shared hit=2 read=396 written=2
                    ->  Bitmap Index Scan on rm_o_resource_usage_instance_splits_new_solution_idx  (cost=0.00..685.00 rows=22477 width=0) (actual time=1.609..1.609 rows=22302 loops=1)
                          Index Cond: (solution = 10)
                          Buffers: shared hit=2 read=77
              ->  Hash  (cost=12.66..12.66 rows=5 width=48) (actual time=0.023..0.023 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    Buffers: shared hit=1 read=1
                    ->  Bitmap Heap Scan on rm_o_resource_usage r  (cost=4.19..12.66 rows=5 width=48) (actual time=0.020..0.020 rows=1 loops=1)
                          Recheck Cond: (schedule_id = 10)
                          Heap Blocks: exact=1
                          Buffers: shared hit=1 read=1
                          ->  Bitmap Index Scan on rm_o_resource_usage_sched  (cost=0.00..4.19 rows=5 width=0) (actual time=0.017..0.017 rows=1 loops=1)
                                Index Cond: (schedule_id = 10)
                                Buffers: shared read=1
        ->  Index Scan using scheduledactivities_activity_index_idx on scheduledactivities sa  (cost=0.42..1.53 rows=1 width=16) (actual time=0.004..0.007 rows=1 loops=22302)
              Index Cond: (activity_index = s.activity_index)
              Filter: (solution_id = 10)
              Rows Removed by Filter: 5
              Buffers: shared hit=198699 read=5596 written=610
Planning time: 7.070 ms
Execution time: 248.691 ms

每次我运行EXPLAIN时，都会得到大致相同的结果。 执行时间始终在170毫秒至250毫秒之间，对我而言，这完全可以。但是，当此查询通过C ++项目运行时（使用PQexec(conn, query)，其中conn是专用连接，而query是上面的查询），花费的时间似乎相差很大。通常，查询速度非常快，而且您不会注意到延迟。问题是，有时此查询需要2到3分钟才能完成。

如果我打开pgadmin，并查看数据库的“服务器活动”，则大约有30个左右的连接，大多数位于“空闲”位置。上面的查询的连接被标记为“活动”，并且将保持“活动”状态几分钟。

我不知道为什么随机花几分钟来完成相同的查询，而数据库中的数据也没有变化。我尝试过增加work_mem并没有任何改变（我也没有期望这样做）。任何帮助或建议，将不胜感激。

没有更具体的标签，但是我目前正在使用Postgres 10.11，但这在其他版本的10.x上也是一个问题。系统是至强四核@ 3.4Ghz，具有SSD和24GB内存。

根据jjanes的建议，我输入了auto_explain。最终转到以下输出：

duration: 128057.373 ms 
plan:
Query Text: SET work_mem = '32MB';SELECT        s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset FROM     rm_o_resource_usage_instance_splits_new s INNER JOIN    rm_o_resource_usage r ON s.usage_id = r.id INNER JOIN  scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id WHERE      r.schedule_id = 12642 ORDER BY   r.resource_id, s.start_date
Sort  (cost=14.36..14.37 rows=1 width=98) (actual time=128042.083..128043.287 rows=21899 loops=1)
  Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6585kB
  Buffers: shared hit=21198435 read=388 dirtied=119
  ->  Nested Loop  (cost=0.85..14.35 rows=1 width=98) (actual time=4.995..127958.935 rows=21899 loops=1)
        Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 705476285
        Buffers: shared hit=21198435 read=388 dirtied=119
        ->  Nested Loop  (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
              Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
              Inner Unique: true
              Join Filter: (s.usage_id = r.id)
              Buffers: shared hit=22102 read=388 dirtied=119
              ->  Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s  (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
                    Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
                    Index Cond: (s.solution = 12642)
                    Buffers: shared hit=203 read=388 dirtied=119
              ->  Seq Scan on public.rm_o_resource_usage r  (cost=0.00..1.29 rows=1 width=57) (actual time=0.002..0.002 rows=1 loops=21899)
                    Output: r.id, r.schedule_id, r.resource_id
                    Filter: (r.schedule_id = 12642)
                    Rows Removed by Filter: 26
                    Buffers: shared hit=21899
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12642)
              Buffers: shared hit=21176333",,,,,,,,,""

编辑：下表的完整定义如下：

CREATE TABLE public.rm_o_resource_usage_instance_splits_new
(
    start_time integer NOT NULL,
    end_time integer NOT NULL,
    resources jsonb NOT NULL,
    activity_index integer NOT NULL,
    usage_id bigint NOT NULL,
    start_date text COLLATE pg_catalog."default" NOT NULL,
    end_date text COLLATE pg_catalog."default" NOT NULL,
    solution bigint NOT NULL,
    CONSTRAINT rm_o_resource_usage_instance_splits_new_pkey PRIMARY KEY (start_time, activity_index, usage_id),
    CONSTRAINT rm_o_resource_usage_instance_splits_new_solution_fkey FOREIGN KEY (solution)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT rm_o_resource_usage_instance_splits_new_usage_id_fkey FOREIGN KEY (usage_id)
        REFERENCES public.rm_o_resource_usage (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_activity_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (activity_index ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_solution_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (solution ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_usage_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (usage_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE TABLE public.rm_o_resource_usage
(
    id bigint NOT NULL DEFAULT nextval('rm_o_resource_usage_id_seq'::regclass),
    schedule_id bigint NOT NULL,
    resource_id text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT rm_o_resource_usage_pkey PRIMARY KEY (id),
    CONSTRAINT rm_o_resource_usage_schedule_id_fkey FOREIGN KEY (schedule_id)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_idx
    ON public.rm_o_resource_usage USING btree
    (id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_sched
    ON public.rm_o_resource_usage USING btree
    (schedule_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE TABLE public.scheduledactivities
(
    id bigint NOT NULL DEFAULT nextval('scheduledactivities_id_seq'::regclass),
    solution_id bigint NOT NULL,
    activity_id text COLLATE pg_catalog."default" NOT NULL,
    sequence_index integer,
    startminute integer,
    finishminute integer,
    issue text COLLATE pg_catalog."default",
    activity_index integer NOT NULL,
    is_objective boolean NOT NULL,
    usedresourceset integer DEFAULT '-1'::integer,
    start timestamp without time zone,
    finish timestamp without time zone,
    is_ore boolean,
    is_ignored boolean,
    CONSTRAINT scheduled_activities_pkey PRIMARY KEY (id),
    CONSTRAINT scheduledactivities_solution_id_fkey FOREIGN KEY (solution_id)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX scheduled_activities_activity_id_idx
    ON public.scheduledactivities USING btree
    (activity_id COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduled_activities_id_idx
    ON public.scheduledactivities USING btree
    (id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduled_activities_idx
    ON public.scheduledactivities USING btree
    (solution_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduledactivities_activity_index_idx
    ON public.scheduledactivities USING btree
    (activity_index ASC NULLS LAST)
    TABLESPACE pg_default;

编辑：添加索引on scheduledactivities (solution_id, activity_index)后，来自auto_explain的其他输出

  Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6283kB
  Buffers: shared hit=20159117 read=375 dirtied=190
  ->  Nested Loop  (cost=0.85..10.76 rows=1 width=100) (actual time=5.518..122489.627 rows=20761 loops=1)
        Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 668815615
        Buffers: shared hit=20159117 read=375 dirtied=190
        ->  Nested Loop  (cost=0.42..5.80 rows=1 width=112) (actual time=0.057..217.563 rows=20761 loops=1)
              Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
              Inner Unique: true
              Join Filter: (s.usage_id = r.id)
              Buffers: shared hit=20947 read=375 dirtied=190
              ->  Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s  (cost=0.42..4.44 rows=1 width=69) (actual time=0.049..17.622 rows=20761 loops=1)
                    Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
                    Index Cond: (s.solution = 12644)
                    Buffers: shared hit=186 read=375 dirtied=190
              ->  Seq Scan on public.rm_o_resource_usage r  (cost=0.00..1.35 rows=1 width=59) (actual time=0.002..0.002 rows=1 loops=20761)
                    Output: r.id, r.schedule_id, r.resource_id
                    Filter: (r.schedule_id = 12644)
                    Rows Removed by Filter: 22
                    Buffers: shared hit=20761
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.94 rows=1 width=16) (actual time=0.007..4.654 rows=32216 loops=20761)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12644)
              Buffers: shared hit=20138170",,,,,,,,,""

重现此问题的最简单方法是在三个表中添加更多值。我没有删除任何内容，只删除了几千个INSERT。

Answer 1

->  Index Scan using .. s  (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
                    Index Cond: (s.solution = 12642)

计划者认为它将找到1行，而是找到21899。该错误可以很明显地导致错误的计划。而且，一个相等条件的估计应该非常准确，所以我要说表上的统计数据远不及此。可能是autovac启动器的调优不佳，因此无法经常运行，或者可能是数据的选定部分变化非常快（在运行查询之前，您只是插入了s.solution = 12642的21899行）？），因此统计信息不够准确。

  ->  Nested Loop ...
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 705476285
        ->  ...
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12642)

如果无法使用散列连接，则可以通过建立索引on scheduledactivities (solution_id, activity_index)至少减轻嵌套循环的危害。这样，activity_index标准可以成为索引条件的一部分，而不是联接过滤器。然后，您可能会专门将索引放在solution_id上，因为维护两个索引毫无意义。

Answer 2

快速计划的SQL语句使用WHERE r.schedule_id = 10，并返回大约22000行（估计为105）。慢速计划的SQL语句使用WHERE r.schedule_id = 12642并返回大约21000行（估计只有1行）。

较慢的计划是使用嵌套循环而不是哈希联接：可能是因为联接的估计不正确：估计的行为1，但实际的行为21899。例如，在此步骤中：

Nested Loop  (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)

如果数据不变，则某些列可能存在统计问题（偏斜数据）。

查询中的随机延迟似乎

2 个答案: