Question

我有以下postgresql查询（为便于阅读而简化）：

select *
from a_view
where a in (select * from a_function(a_input))
      and b in (select * from b_function(b_input));

此查询执行速度过慢。

如果我独立运行两个子查询，它们非常快。如果我运行查询写出子查询的输出，即：

select *
from a_view
where a in (394990, 393762, 393748, 1)
      and b in (331142, 330946, 331228, 331325);

这也很快。我运行explain analyze并在上面的原始形式中实现，查询不能使用索引并使用顺序扫描。为了给出更多细节，视图（a_view）涉及一个大表（超过1000万行），它在（a，b）和（b）上都被索引。

有没有办法帮助查询利用索引？

Answer 1

可能有两个问题：

默认情况下，任何SRF函数都有1000个ROWS子句 - 规划人员也期望这样。在你的例子中它是错误的。尝试将此属性设置为更合适的值（例如，10 - 太小也可能不好）：

postgres=# explain select * from xx();
┌───────────────────────────────────────────────────────────┐
│                        QUERY PLAN                         │
╞═══════════════════════════════════════════════════════════╡
│ Function Scan on xx  (cost=0.25..10.25 rows=1000 width=4) │
└───────────────────────────────────────────────────────────┘
(1 row)

PLpgSQL函数是计划程序的 blackbox 。与使用函数相比，如果仅使用常量列表，则规划器可以获得有关谓词的更多信息。在这种情况下，规划人员必须使用一些默认规则，这些规则可能适用于您的情况。

postgres=# explain select * from xx where a in (10,20);
┌────────────────────────────────────────────────────┐
│                     QUERY PLAN                     │
╞════════════════════════════════════════════════════╡
│ Seq Scan on xx  (cost=0.00..170.00 rows=2 width=4) │
│   Filter: (a = ANY ('{10,20}'::integer[]))         │
└────────────────────────────────────────────────────┘
(2 rows)

postgres=# explain select * from xx where a in (select * from xx());
┌──────────────────────────────────────────────────────────────────────────────────┐
│                                    QUERY PLAN                                    │
╞══════════════════════════════════════════════════════════════════════════════════╡
│ Hash Join  (cost=17.25..201.85 rows=5000 width=4)                                │
│   Hash Cond: (xx.a = xx_1.xx)                                                    │
│   ->  Seq Scan on xx  (cost=0.00..145.00 rows=10000 width=4)                     │
│   ->  Hash  (cost=14.75..14.75 rows=200 width=4)                                 │
│         ->  HashAggregate  (cost=12.75..14.75 rows=200 width=4)                  │
│               Group Key: xx_1.xx                                                 │
│               ->  Function Scan on xx xx_1  (cost=0.25..10.25 rows=1000 width=4) │
└──────────────────────────────────────────────────────────────────────────────────┘
(7 rows)

我有两个可能相同的查询，计划相当不同，可能性能差异很大。

什么可以解决方案：

不要这样做 - 在SQL查询的关键位置（主要是WHERE子句）使用plpgsql会产生非常负面的影响。

您可以重写您的函数以返回int[]而不是SETOF int。在这种情况下，规划人员将使用不同的规则，性能可以更好。

postgres=# explain select * from xx where a = any( xx2());
┌──────────────────────────────────────────────────────┐
│                      QUERY PLAN                      │
╞══════════════════════════════════════════════════════╡
│ Seq Scan on xx  (cost=0.00..2770.00 rows=11 width=4) │
│   Filter: (a = ANY (xx2()))                          │
└──────────────────────────────────────────────────────┘
(2 rows)

如果a_function和b_function的结果不依赖于内容a和b，那么可以在查询之前评估它们在这些函数上设置标志IMMUTABLE。然后在计划时间内评估函数，结果用作常量 - 计划器将获得更多信息。 注意：如果先决条件为false，则结果可能是错误的。小心。

-- xx2 is IMMUTABLE now
postgres=# explain select * from xx where a = any( xx2());
┌───────────────────────────────────────────────────────┐
│                      QUERY PLAN                       │
╞═══════════════════════════════════════════════════════╡
│ Seq Scan on xx  (cost=0.00..182.50 rows=3 width=4)    │
│   Filter: (a = ANY ('{30314,3783,70448}'::integer[])) │
└───────────────────────────────────────────────────────┘
(2 rows)

涉及视图和函数的查询无法使用索引

1 个答案: