Question

我使用Postgres匹配字符串列中的模式，没有全文搜索引擎（因为我不想要词干，停用词，排名等）。

如何检索匹配的模式总数（即使多个字段包含多次模式）。这可能吗？

例如：在

中搜索狗

text
----

The dog looked at the other dog.
The dog looked at the cat.

搜索狗时的结果：3次点击。

Answer 1

SELECT
SUM(
  (LENGTH(text) - LENGTH(REGEXP_REPLACE(text,'dog','','g'))) / LENGTH('dog')
) as hits
FROM
the_table

Answer 2

您可以使用全文搜索来执行此操作，而无需使用词干和停用词。你可以使用＆＃34; simple＆＃34;字典。有关documentation中词典的更多信息。

这里是表格的示例＆＃34; tst＆＃34;：

CREATE TABLE tst (t text);
INSERT INTO tst VALUES ('The dog looked at the other dog.');
INSERT INTO tst VALUES ('The dog looked at the cat.');

使用函数ts_stat（）的示例查询：

postgres=# SELECT SUM(nentry) FROM ts_stat('SELECT to_tsvector(t) FROM tst') WHERE word = 'dog';
 sum 
-----
   3
(1 row)

我不知道ts_stat（）的性能。您可以使用索引对其进行测试。

Answer 3

如果使用regexp_matches参数调用，

g会为每个匹配返回一行。如果表中有主键列，则可以使用它来计算找到的匹配数。

select id, count(*)
from the_table, regexp_matches(the_column, 'dog', 'g')
where the_column ~ 'dog'
group by id

条件where the_column ~ 'dog'减少了需要处理的行数，从而减少了需要分组的行数。如果您只有几行包含搜索词，则应提高性能。