Question

如果我的表测试的值如下：

id | value
----------------
1  | ABC 1-2-3
2  | AB 1-2-3-4-5
3  | ABC 1
4  | ABC 1-2
5  | ABC

并且我尝试的输入字符串 ABC 1-2-3-4-5 ，那么最接近的子字符串匹配（如果我可以称之为）应该是 ABC 1-2-3 。第2行不应该匹配，因为它没有＆＃34; ABC＆＃34;。如果输入字符串比实际记录短，我只能搜索字符串，但如果它更长，则不能搜索字符串。例如

select * from test where value ilike 'ABC 1-2%';

但这也没有给我一个确切的记录，但只有那些以ABC 1-2开头。如何构造正确的sql语句来解决这个问题？

Answer 1

您可能对pg_trgm extension感兴趣：

create extension if not exists pg_trgm;

您的数据的标准相似之处如下：

select *, similarity(value, 'ABC 1-2-3-4-5')
from test
order by 3 desc;

 id |    value     | similarity 
----+--------------+------------
  2 | AB 1-2-3-4-5 |        0.8
  1 | ABC 1-2-3    |   0.714286
  4 | ABC 1-2      |   0.571429
  3 | ABC 1        |   0.428571
  5 | ABC          |   0.285714
(5 rows)

但是，您始终可以在WHERE子句中添加其他条件：

select *, similarity(value, 'ABC 1-2-3-4-5')
from test
where value ilike 'abc%'
order by 3 desc;

 id |   value   | similarity 
----+-----------+------------
  1 | ABC 1-2-3 |   0.714286
  4 | ABC 1-2   |   0.571429
  3 | ABC 1     |   0.428571
  5 | ABC       |   0.285714
(4 rows)

Answer 2

反转比较：

select * from test
where 'ABC 1-2-3-4-5' ilike value || '%'
order by length(value) desc

将首先返回最佳（即最长）匹配。

Postgresql：找到最接近子字符串匹配的字符串

2 个答案: