I am writing analytics software that generates complicated queries. When building the where clause, it may happen that multiple constraints are added on the same database field. I was wondering if PostgreSQL rewrites multiple constraints into simpler ones. I did some tests:
SELECT COUNT(id) FROM employee WHERE age BETWEEN 18 AND 40;
I ran this 10 times and the average time spent was 65ms. Now I make the query a bit longer but trivial to optimize:
SELECT COUNT(id) FROM employee WHERE
(age BETWEEN 18 AND 40) AND
(age BETWEEN 18 AND 40) AND
(age BETWEEN 18 AND 40);
This query takes 100ms on average, which is a lot slower. Also, the following query:
SELECT COUNT(id) FROM employee WHERE
(age BETWEEN 28 AND 70) AND
(age BETWEEN 25 AND 40) AND
(age BETWEEN 22 AND 33) AND
(age BETWEEN 18 AND 30);
takes 105ms on average, while it is equivalent to:
SELECT COUNT(id) FROM employee WHERE age BETWEEN 28 AND 30;
which runs twice as fast.
These queries are semantically equivalent and I was expecting them to be optimized into the very same query before even the planner touches them. This seems like low-hanging fruit for the query rewriter. Is there any hidden configuration option I'm missing? I'm using postgresql 9.4.5.
Thank you!
答案 0 :(得分:1)
优化器不会将连续范围折叠在一起。它没有进行那种数据类型级分析。
如果您正在测试整数,浮点值或文本字符串的范围,PostgreSQL并不在意。像这样的范围折叠只适用于离散可数序数的类型。如果您尝试使用浮点值,则可能会得到微妙的错误答案。
PostgreSQL不知道哪种类型会满足这个要求是安全的,所以它不能这样做。此外,查询计划程序完成的每个可能的优化都需要计算成本来检查优化是否适用,因此需要在计划和执行成本之间进行权衡。
TL; DR:此案例不会由规划人员自动优化。
将来,总是为查询/查询提供PostgreSQL版本和explain (buffers, analyze)
输出。