在phraseto_tsquery中一起添加多个短语

时间:2017-03-10 16:52:30

标签: postgresql full-text-search postgresql-9.6 tsvector

我已经成功地将单个单词的数组连接到to_tsquery的字符串中,但postgres 9.6中的phraseto_tsquery只允许一个关键字短语。有没有人知道查询tsvector的解决方案(无论是在Sql还是全文搜索功能中),这样我可以(或/和)动态数量的短语进入查询。选择块都是文本数组。

首先尝试:

SELECT to_tsvector('english','Try not to become a man of successful companies, but rather try to become a man of value')
   @@ (to_tsquery('english','man & become')
       && phraseto_tsquery('english','man of value')
       && phraseto_tsquery('english','company')
       || phraseto_tsquery('english', 'company | man of value')
   );

搜寻动物的现实问题示例:

-- with statements here of opp_tsv and tp
SELECT
  tp.id,
  tp.keywords, --['giraffes','lions', 'monkeys']
  tp.phrase_keywords, --['pygmy marmocet','African Lion']
  tp.neg_keywords, --['aliens', 'spaceships', 'space']
  tp.neg_phrase_keywords --['Andromedan Alien', 'Nibiru Reptilian']
FROM tp, opp_tsv,
  -- string logic for ts_query
      concat(array_to_string(tp.keywords, ' | ')) AS kws_concat,
      concat(array_to_string(tp.neg_keywords, ' | ')) AS     neg_kws_concat,
      to_tsquery('english', kws_concat) query,
      to_tsquery('english', concat(neg_kws_concat)) neg_query
  -- Case logic for phrase queries

  -- .... -> phrase_query,
      phraseto_tsquery('phrase to search | Need this phrase too')
  -- .... -> phrase_neg_query,

WHERE
  (
    opp_tsv.doc @@ query --pos
    OR
    opp_tsv.doc @@ phrase_query --pos
  )
  AND NOT (
    opp_tsv.doc @@ neg_query --neg
    OR
    opp_tsv.doc @@ phrase_neg_query --neg
  )
ORDER BY rank_cd DESC;

思想: 根据数组长度动态生成

opp_tsv.doc @@ (phrase_query || phrase_query2)

或以某种方式实现这一目标

opp_tsv.doc @@ phraseto_tsquery('big messy phrase | more messy wordphrases')

编辑: SELECT phraseto_tsquery('phrase to search | Need this phrase too') result = 'phrase' <-> 'to' <-> 'search' <-> 'need' <-> 'this' <-> 'phrase' <-> 'too' 我要找的是'phrase<->to<->search' | 'need<->this<->phrase<->too'

的结果

1 个答案:

答案 0 :(得分:2)

You can define your own aggregate超过tsquery或(||)运营商:

CREATE AGGREGATE tsquery_or_agg(tsquery) (
  SFUNC = tsquery_or,
  STYPE = tsquery
);

注意:上面的聚合依赖于tsquery的{​​{1}}运算符由||函数支持的事实。您可以通过以下方式检查:

tsquery_or(tsquery, tsquery)

如果您不想依赖此(未记录的)函数的名称(即使它不太可能被更改),您可以创建自己的函数作为聚合的基函数(SELECT * FROM pg_operator WHERE oprname = '||' AND oprleft = regtype 'tsquery' AND oprright = regtype 'tsquery'; ) :

SFUNC

之后,您的查询将是:

CREATE FUNCTION my_tsquery_or(tsquery, tsquery)
  RETURNS tsquery
  LANGUAGE sql
  IMMUTABLE
  STRICT
  AS 'SELECT $1 || $2';

此外,WITH tp(id, keywords, phrase_keywords, neg_keywords, neg_phrase_keywords ) AS ( VALUES (42, ARRAY['giraffes', 'lions', 'monkeys']::text[], ARRAY['pygmy marmocet', 'African Lion']::text[], ARRAY['aliens', 'spaceships', 'space']::text[], ARRAY['Andromedan Alien', 'Nibiru Reptilian']::text[]) ), tq(id, query) AS ( SELECT tp.id, (((SELECT tsquery_or_agg(plainto_tsquery(kw)) FROM unnest(keywords) kw) || (SELECT tsquery_or_agg(phraseto_tsquery(pk)) FROM unnest(phrase_keywords) pk)) && !!((SELECT tsquery_or_agg(plainto_tsquery(nk)) FROM unnest(neg_keywords) nk) || (SELECT tsquery_or_agg(phraseto_tsquery(np)) FROM unnest(neg_phrase_keywords) np))) FROM tp ), opp_tsv(doc) AS ( VALUES (to_tsvector('Earth''s African Lions')), (to_tsvector('Andromedan Alien''s space monkeys')) ) SELECT tp.id, tp.keywords, tp.phrase_keywords, tp.neg_keywords, tp.neg_phrase_keywords, opp_tsv.doc FROM opp_tsv, tp JOIN tq USING (id) WHERE opp_tsv.doc @@ tq.query ORDER BY ts_rank_cd(opp_tsv.doc, tq.query) DESC; 中的如果字段可以包含tp等字词,那么您首先没有正确地拆分输入。您可以使用'big messy phrase | more messy wordphrases'功能拆分此类短语/关键字。有了这个,regexp_split_to_table() CTE应该看起来像:

tq