Postgres反向LIKE查找索引和性能

时间:2017-03-09 16:13:14

标签: postgresql indexing

我们有一个音乐家表,其中包含多个字符串字段的记录,例如:

  • " Jimi"," Hendrix","吉他"
  • " Phil"," Collins"," Drums"
  • " Sting",""," Bass"
  • " Ringo"," Starr"," Drums"
  • " Paul"," McCartney"," Bass"

我想把postgres传递给一个很长的字符串,比如说:

  

"众所周知,Jimi喜欢为他的吉他点亮并粉碎   在舞台上的所有鼓。"

我希望返回有任何匹配的字段 - 最好按照最匹配的顺序排列:

  • " Jimi "," Hendrix"," 吉他"
  • " Phil"," Collins"," Drums "
  • " Ringo"," Starr"," "

因为我需要搜索不区分大小写,所以我要构建这样的查询...

select * from musicians where lowercase_string like '%'||firstname||'%' or  lowercase_string like '%'||lastname||'%' or lowercase_string like '%'||instrument||'%'

然后循环(在我的情况下在ruby中)以捕获具有最多匹配的结果。

然而,这在sql阶段(1分钟+)非常慢。

我已经尝试使用pg_trgm添加小写GIN索引[{3}} - 但它没有帮助 - 可能是因为类似的查询已经回到了前面?

谢谢!

2 个答案:

答案 0 :(得分:2)

通过我的测试,似乎没有三元组索引可以帮助您查询。并且没有其他索引类型可能加速基于(I)LIKE / FTS的搜索。

我应该提一下,下面的所有查询都会使用trigram索引,当它们被查询时#34;反转":当表包含文档(被索引)时,您的参数是查询。 (I)LIKE变体变体f.ex.用它快2-3倍。

我测试的这些查询:

select *
from   musicians
where  :input_string ilike '%' || firstname  || '%'
or     :input_string ilike '%' || lastname   || '%'
or     :input_string ilike '%' || instrument || '%'

起初,FTS似乎是一个好主意,但我的测试显示,即使没有排名,它也比(I)LIKE变体慢60-100倍。 (所以,即使你不必使用这些方法对结果进行后期处理,这些也是不值得的。)

select *
from   musicians
where  to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))

然而,ORDER BY rank并没有进一步减慢速度:它比(I)LIKE变体慢70-120倍。

select   *
from     musicians
where    to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))
order by ts_rank(to_tsvector(:input_string), plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))

然后,为了最后的努力,我尝试了(相当新的)&#34;单词相似度&#34; trigram模块的运算符:<%%>(可从PostgreSQL 9.6获得)。

select *
from   musicians
where  :input_string %> firstname
or     :input_string %> lastname
or     :input_string %> instrument

select *
from   musicians
where  firstname  <% :input_string
or     lastname   <% :input_string
or     instrument <% :input_string

这些比FTS快一些:比(I)LIKE变种慢约50-70倍。

(部分工作)rextester:它针对PostgreSQL 9.5运行,因此9.6运营商显然不会在这里运行。

更新 IF 完整字匹配就足够了,您实际上可以撤消查询,以便能够使用索引。你需要解析&#34;你的查询(又名。&#34;长字符串&#34;)虽然:

with long_string(ls) as (
  values (:input_string)
),
words(word) as (
  select s
  from   long_string, regexp_split_to_table(ls, '[^[:alnum:]]+') s
  where  s <> ''
)
select   musicians.*
from     musicians, words
where    firstname  ilike word
or       lastname   ilike word
or       instrument ilike word
group by musicians.id

注意:我为每个完整的单词解析了查询。您可以在那里使用其他逻辑,或者甚至可以在客户端解析。

默认的btree索引在这里闪耀,因为它比(I)LIKE的三元组索引快得多(无论如何我们都不需要它们,因为我们在这里寻找完整的单词匹配) :

with long_string(ls) as (
  values (:input_string)
),
words(word) as (
  select s
  from   long_string, regexp_split_to_table(lower(ls), '[^[:alnum:]]+') s
  where  s <> ''
)
select   musicians.*
from     musicians, words
where    lower(firstname)  = word
or       lower(lastname)   = word
or       lower(instrument) = word
group by musicians.id

http://rextester.com/PSABJ6745

您甚至可以通过

等方式获得匹配计数
sum((lower(firstname)  = word)::int
  + (lower(lastname)   = word)::int
  + (lower(instrument) = word)::int)

答案 1 :(得分:2)

匹配排序的ilike选项:

with long_string (ls) as (values
    ('It is known that Jimi liked to set light to his guitar and smash up all the drums while on stage.')
)
select musicians.*, matches
from
    musicians
    cross join
    long_string
    cross join lateral
    (select
        (ls ilike format ('%%%s%%', first_name) and first_name != '')::int +
        (ls ilike format ('%%%s%%', last_name) and last_name != '')::int +
        (ls ilike format ('%%%s%%', instrument) and instrument != '')::int 
        as matches
    ) m
where matches > 0
order by matches desc
;
 first_name | last_name | instrument | matches 
------------+-----------+------------+---------
 Jimi       | Hendrix   | Guitar     |       2
 Phil       | Collins   | Drums      |       1
 Ringo      | Starr     | Drums      |       1