Question

过去几天我在postgres中遇到过全文搜索，在搜索多列时我对索引感到有点困惑。

postgres docs谈论在连续列上创建ts_vector索引，如下所示：

CREATE INDEX pgweb_idx ON pgweb 
    USING gin(to_tsvector('english', title || ' ' || body));

我可以这样搜索：

... WHERE 
      (to_tsvector('english', title||' '||body) @@ to_tsquery('english', 'foo'))

但是，如果我想有时只搜索标题，有时只搜索正文，有时只搜索两者，我需要3个单独的索引。如果我在第三列添加，那么可能是6个索引，依此类推。

我在文档中没有看到的另一种方法是单独索引两列，然后只使用普通的WHERE...AND查询：

... WHERE
      (to_tsvector('english', title) @@ to_tsquery('english','foo'))
    AND
      (to_tsvector('english', body) @@ to_tsquery('english','foo'))

对这两个行进行基准测试~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

所以我的问题是：

为什么我要连接这样的索引，而不是单独索引列？两者的优点/缺点是什么？

我最好的猜测是，如果我提前知道，我只想搜索两个列（一次不会一个），我只需要通过连接使用更少的内存来获得一个索引。

修改

转移到：https://dba.stackexchange.com/questions/15412/postgres-full-text-search-with-multiple-columns-why-concat-in-index-and-not-at

Answer 1

对于数据库，使用一个索引更容易/更快;
使用两个索引时，对结果进行正确排序将非常困难;
您可以在创建单个索引时为列指定相对权重，以便title中的匹配值比body中的匹配值更高;
您在这里搜索单个单词，如果您搜索多个单词并且它们分别出现在不同的列中会发生什么？

Answer 2

要回答实施＃3的问题，请参阅https://www.postgresql.org/docs/9.1/textsearch-controls.html：

砝码是字母A，B，C或D之一

UPDATE tt SET ti =
    setweight(to_tsvector(coalesce(title,'')), 'A')    ||
    setweight(to_tsvector(coalesce(keyword,'')), 'B')  ||
    setweight(to_tsvector(coalesce(abstract,'')), 'C') ||
    setweight(to_tsvector(coalesce(body,'')), 'D');

Postgres全文搜索多列，为什么在索引中连接而不是在运行时？

2 个答案: