Question

我正面临有关postgresql上的文本搜索配置的问题。我有一个表users，其中包含一列name。用户名可以是法语，英语，西班牙语或任何其他语言。所以我需要使用postgresql的全文搜索。我现在使用的默认文本搜索配置是simple configuration，但是无法进行搜索并获得合适的结果。我尝试组合不同的文本搜索配置，如下所示：

(to_tsvector('english', document) || to_tsvector('french', document) || to_tsvector('spanish', document) || to_tsvector('russian', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query) || to_tsquery('spanish', query) || to_tsquery('russian', query))

但是，如果我们测试一下这个问题，那么这个查询并没有给出合适的结果：

select (to_tsvector('english', 'adam and smith') || to_tsvector('french', 'adam and smith') || to_tsvector('spanish', 'adam and smith') || to_tsvector('russian', 'adam and smith')) 

tsvector: 'adam':1,4,7,10 'and':5,8 'smith':3,6,9,12

使用单词的原始语言：

select (to_tsvector('english', 'adam and smith')) 
tsvector: 'adam':1 'smith':3

首先要提到的是，当我们将不同的配置与||运算符组合时，停用词不是令牌。是否有任何解决方案可以组合不同的文本搜索配置，并在用户搜索文本时使用合适的语言？

Answer 1

也许您认为||是“或”运算符，但它会连接文本搜索向量。

看看你的表情会发生什么。

在\dF+ french中运行psql将向您显示asciiword s，使用法国雪球词干分析器。这会删除停用词并将词语缩减到词干。类似于英语和俄语。

您可以使用ts_debug查看此操作：

test=> SELECT * FROM ts_debug('english', 'adam and smith');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes 
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | adam  | {english_stem} | english_stem | {adam}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | and   | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | smith | {english_stem} | english_stem | {smith}
(5 rows)

test=> SELECT * FROM ts_debug('french', 'adam and smith');
   alias   |   description   | token | dictionaries  | dictionary  | lexemes 
-----------+-----------------+-------+---------------+-------------+---------
 asciiword | Word, all ASCII | adam  | {french_stem} | french_stem | {adam}
 blank     | Space symbols   |       | {}            |             | 
 asciiword | Word, all ASCII | and   | {french_stem} | french_stem | {and}
 blank     | Space symbols   |       | {}            |             | 
 asciiword | Word, all ASCII | smith | {french_stem} | french_stem | {smith}
(5 rows)

现在，如果你连接这四个tsvector，你最终会在{1}，第4，第7和第10位获得{。}}。

没有一种方法可以同时使用不同语言的全文搜索。

但如果它是您正在搜索的真人姓名，我会执行以下操作：

为adam创建一个带有simple字典的文本搜索配置，并为字典使用空的停用词文件，或者包含所有语言都可接受的停用词。

通常不应该阻止个人姓名，因此您可以避免这个问题。如果你错过了一个禁用词，这没什么大不了的。它只会使得asciiword（和索引）变大，但是使用个人名称时，不应该有太多的停用词。

postgresql上的全文搜索配置

1 个答案: