Question

我有这张桌子：

CREATE TABLE public.prodhistory (
  curve_id           int4 NOT NULL,
  start_prod_date    date NOT NULL,
  prod_date          date NOT NULL,
  monthly_prod_rate  float4 NOT NULL,
  eff_date           timestamp NOT NULL,
  /* Keys */
  CONSTRAINT prodhistorypk
    PRIMARY KEY (curve_id, prod_date, start_prod_date, eff_date),
  /* Foreign keys */
  CONSTRAINT prodhistory2typecurves_fk
    FOREIGN KEY (curve_id)
    REFERENCES public.typecurves(curve_id)
) WITH (
    OIDS = FALSE
  );

CREATE INDEX prodhistory_idx_curve_id01
  ON public.prodhistory
  (curve_id);

行~42M行。

我执行此查询：

SELECT DISTINCT curve_id FROM prodhistory

考虑到索引，我预计会非常快。但不，270秒。所以我解释一下，我得到了：

HashAggregate  (cost=824870.03..824873.08 rows=305 width=4) (actual time=211834.018..211834.097 rows=315 loops=1)   
  Output: curve_id  
  Group Key: prodhistory.curve_id   
  ->  Seq Scan on public.prodhistory  (cost=0.00..718003.22 rows=42746722 width=4) (actual time=12.751..200826.299 rows=43218808 loops=1)   
        Output: curve_id    
Planning time: 0.115 ms 
Execution time: 211848.137 ms

我没有阅读这些计划的经验，但数据库上的Seq Scan似乎很糟糕。

有什么想法？我有点难过。

Answer 1

选择此计划是因为PostgreSQL认为它更便宜。

您可以通过设置进行比较

SET enable_seqscan=off;

然后重新运行EXPLAIN (ANALYZE)语句。在两种情况下比较cost和actual time并检查PostgreSQL是否正确估计。

如果您发现使用Index Scan或Index Only Scan实际上更便宜，您可以考虑调整成本参数以更好地匹配您的计算机，例如：降低random_page_cost或cpu_index_tuple_cost或提高cpu_tuple_cost。

为什么不使用不同的索引Postgres？

1 个答案: