Question

我不确定我是否正确地提出了这个问题。但是我们如何在POSTGRES中搜索字符串，以便可以实现以下结果。

要搜索的字符串：

Google Pvt Ltd

表格中的数据

symbol, company name
GOOG, Google Ltd
FACEBOOK, Facebook Corp
APPLE, Apple Inc
DELL, Dell Ltd

如何返回搜索结果

GOOG，Google Ltd

，逻辑是它根据匹配的最大单词返回结果。

我正在研究POSTGRES中的全文搜索选项，我可以使用to_tsvector理解标记化。但我不确定如何继续这样做。这种搜索是否可行？

Answer 1

您可以使用pg_trgm扩展程序。

create extension if not exists pg_trgm;

with my_table(symbol, company_name) as (
values
    ('GOOG', 'Google Ltd'),
    ('FACEBOOK', 'Facebook Corp'),
    ('APPLE', 'Apple Inc'),
    ('DELL', 'Dell Ltd')
)

select *, similarity(company_name, 'Google Pvt Ltd')
from my_table
order by similarity desc;

  symbol  | company_name  | similarity 
----------+---------------+------------
 GOOG     | Google Ltd    |   0.733333
 DELL     | Dell Ltd      |        0.2
 APPLE    | Apple Inc     |  0.0416667
 FACEBOOK | Facebook Corp |          0
(4 rows)

您可以定义当前的相似性阈值，只需使用%运算符，例如：

select set_limit(0.6);

select *
from my_table
where company_name % 'Google Pvt Ltd'

 symbol | company_name 
--------+--------------
 GOOG   | Google Ltd
(1 row)

Answer 2

我不确定您是否需要全文搜索 - 这取决于性能。还有其他方法，例如打破列和输入单词并直接匹配它们。

以下是一种使用regexp_matches()的方法：

select v.*,
       (select count(*) from regexp_matches(symbol || ' ' || company, replace('Google Pvt Ltd', ' ', '|'), 'g')) as matches
from (values ('GOOG', 'Google Ltd'),
             ('FACEBOOK', 'Facebook Corp'),
             ('APPLE', 'Apple Inc'),
             ('DELL', 'Dell Ltd')
    ) v(symbol, company)
order by matches desc
fetch first 1 row only;

搜索字符串列表中的超级字符串 - POSTGRES

2 个答案: