Question

我必须将DB提取到外部数据库服务器以获取许可软件。 DB必须是Postgres，我无法从应用程序更改select查询（无法更改源代码）。

表（必须是1个表）保存大约6,5M行，并且在主列（前缀）中具有唯一值。

所有请求都是读取请求，没有插入/更新/删除，并且有~200k选择/天，峰值为15 TPS。

选择查询是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
WHERE '00436641997142' LIKE prefix 
AND company = 0  and ((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )  
ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC 
LIMIT 1;

解释分析显示以下

Limit  (cost=406433.75..406433.75 rows=1 width=113) (actual time=1721.360..1721.361 rows=1 loops=1)
  ->  Sort  (cost=406433.75..406436.72 rows=1188 width=113) (actual time=1721.358..1721.358 rows=1 loops=1)
        Sort Key: ("position"((prefix)::text, '%'::text)), (char_length(prefix)) DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on table  (cost=0.00..406427.81 rows=1188 width=113) (actual time=1621.159..1721.345 rows=1 loops=1)
              Filter: ((company = 0) AND ('00381691997142'::text ~~ (prefix)::text) AND ((strpos(("Day")::text, (to_char(now(), 'ID'::text))::text) > 0) OR ("Day" IS NULL)) AND (((('now'::cstring)::time with time zone >= (timefrom)::time with time zone) AN (...)
              Rows Removed by Filter: 6417130
Planning time: 0.165 ms
Execution time: 1721.404 ms`

查询的最慢部分是：

 SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table 
 WHERE '00436641997142' LIKE prefix

生成1,6s（仅测试此部分查询）

部分查询单独测试：

Seq Scan on table  (cost=0.00..181819.07 rows=32086 width=113) (actual time=1488.359..1580.607 rows=1 loops=1)
  Filter: ('004366491997142'::text ~~ (prefix)::text)
  Rows Removed by Filter: 6417130
Planning time: 0.061 ms
Execution time: 1580.637 ms

关于数据本身：列＆＃34;前缀＆＃34;具有相同的前几位数字（前5位），其余的是不同的，唯一的数字。

Postgres版本是9.5 我已经更改了Postgres的以下设置：

random-page-cost = 40
effective_cashe_size = 4GB
shared_buffer = 4GB
work_mem = 1GB

我尝试了几种索引类型（unique，gin，gist，hash），但在所有情况下都没有使用索引（如上面的解释所述），结果速度相同。我也做了，但没有明显的改进：

vacuum analyze verbose table

请推荐数据库和/或索引配置的设置，以加快此查询的执行时间。

目前的HW是 Win7上的i5，SSD，16GB RAM，但我可以选择购买更强大的硬盘。据我所知，对于读取（无插入/更新）占主导地位的情况，更快的CPU核心比核心数量或磁盘速度更重要＆gt;拜托，确认。

附加组件1：添加9个索引后，也不使用索引。

附加组件2： 1）我发现了不使用索引的原因：查询中的单词顺序部分就像是原因。如果查询将是：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table WHERE prefix like '00436641997142%'
AND company = 0  and 
((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )
 ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC LIMIT 1

它使用索引。

注意区别：

... WHERE '00436641997142%' like prefix ...

正确使用索引的查询：

... WHERE prefix like '00436641997142%' ...

由于我无法改变查询本身，任何想法如何克服这一点？我可以更改数据和Postgres设置，但不能自行查询。

2）另外，为了使用并行的seq.scan，我安装了Postgres 9.6版本。在这种情况下，仅当查询的最后部分被省略时才使用并行扫描。所以，查询：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM table WHERE '00436641997142' LIKE prefix 
AND company = 0  and 
((current_time between timefrom and timeto) or (timefrom is null and timeto is null))
 ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC LIMIT 1

使用并行模式。

任何想法如何强制原始查询（我无法更改查询）：

SELECT prefix, changeprefix, deletelast, outgroup, tariff FROM erm_table WHERE '00436641997142' LIKE prefix 
AND company = 0  and 
((current_time between timefrom and timeto) or (timefrom is null and timeto is null)) and (strpos("Day", cast(to_char(now(), 'ID') as varchar)) > 0  or "Day" is null )
 ORDER BY position('%' in prefix) ASC, char_length(prefix) DESC LIMIT 1

使用parallel seq。扫描？

Answer 1

如果我正确理解您的问题，那么创建重写查询的代理服务器可以解决这个问题。

这是example from another question。

然后你可以在你的查询中将“LIKE”更改为“=”，它会运行得更快。

Answer 2

根据documentation：

，您应该通过添加适当的运算符类来更改索引

运算符类text_pattern_ops，varchar_pattern_ops和   bpchar_pattern_ops支持类型text，varchar的B树索引，   和char分别。与默认运算符的区别   类是严格比较值的字符   字符而不是根据特定于语言环境的排序规则   规则。这使得这些运算符类适合查询使用   涉及模式匹配表达式（LIKE或POSIX常规   表达式）当数据库不使用标准＆＃34; C＆＃34;语言环境。   例如，您可以像这样索引varchar列：

CREATE INDEX test_index ON test_table (col varchar_pattern_ops);

postgres大表选择优化

2 个答案: