Question

有人可以解释这些SQL之间如此大的性能差异吗？

SELECT count(*) as cnt FROM table WHERE name ~ '\*{3}'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name ~ '\*\*\*'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name LIKE '%***%'; -- Total runtime 5.000 - 7.000 ms

正如您所看到的，LIKE运算符和简单正则表达式之间的差异是两倍以上（我认为LIKE运算符内部将转换为正则表达式，并且不应该有任何区别）

那里有近13000行，列＃34;名称＆＃34;是＆＃34;文字＆＃34;类型。没有与＆＃34; name＆＃34;相关的索引。表中定义的列。

编辑：

每个人的解释分析：

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*{3}';

Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=18.279..18.280 rows=1 loops=1)
  ->  Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.620..18.266 rows=25 loops=1)
        Filter: (nombre ~ '\*{3}'::text)
Total runtime: 18.327 ms

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*\*\*';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=17.404..17.405 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.608..17.396 rows=25 loops=1)
        Filter: (nombre ~ '\*\*\*'::text)
Total runtime: 17.451 ms

EXPLAIN ANALYZE SELECT count(*) as cnt  FROM datos WHERE nombre LIKE '%***%';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=4.258..4.258 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.138..4.249 rows=25 loops=1)
        Filter: (nombre ~~ '%***%'::text)
Total runtime: 4.295 ms

Answer 1

text LIKE text运算符（~~）由like_match.c中的特定C代码实现。它是与正则表达式完全独立的特殊代码。查看注释，显然已经过专门优化，只能将%和_作为通配符实现，并尽可能短路到退出，而正则表达式引擎则要复杂几个数量级。

请注意，在您的测试用例中，就像正则表达式与LIKE相比不是最理想的，与LIKE

相比，strpos(name, '***') > 0可能不是最理想的

strpos是使用Boyer–Moore–Horspool algorithm实现的，该{{3}}针对大型子字符串进行了优化，搜索文本中的部分匹配较少。

在内部，这些功能得到了合理的优化，但是当有多种方法可以实现相同的目标时，选择可能最好的方法仍然是调用者的工作。 PostgreSQL不会为我们分析匹配的模式，并根据该分析将regexp转换为LIKE或LIKE转换为strpos。

Answer 2

我不确定我是否应该像答案一样发布它...我在PHP中做了类似的粗略比较 - 使用正则表达式和简单的strpos（作为LIKE的替代）过滤大数组。代码：

// regex filter
$filteredRegex = array_filter($a,function($item){
    return preg_match('/000/',$item);
});
// substring search filter
$filteredStrpos = array_filter($a,function($item){
    return strpos($item,'000')!==FALSE;
});

因此，对此代码进行基准测试会导致正则表达式过滤器使strpos的结果在时间上翻倍，因此我可以假设正则表达式的CPU成本大约是子字符串的简单搜索的两倍。

看起来@zerkms有充分的理由：）

PostgreSQL LIKE和正则表达式之间的性能差异

编辑：

2 个答案: