基于相关性搜索的SQL查询优化

时间:2015-03-02 10:52:48

标签: mysql search relevance

此查询在搜索物种名称时返回按相关性排序的行。我将它用于自动完成建议列表并且相关性计算工作正常,但查询在大型表上有点慢,我很感激有关如何优化它的任何提示(MySQL)。我的主要问题是:

  • 我可以在桌面上创建任何有助于优化的索引吗?还是我坚持使用这种类型的查询,显然使用filesort算法? (可能是因为有点慢的原因?)
  • 编辑:我使用InnoDB作为表类型,所以不幸的是我在这种情况下不能使用全文索引(仅适用于MyIsam表)。

    SQL-fiddle here:http://sqlfiddle.com/#!2/f03c4c/5

    SELECT QUERY:

    SET @search ='Boletus a';
    
    SELECT id, genus, species, fullname, 
        (CASE WHEN (CONCAT(genus, ' ', species)=@search) THEN 1 ELSE 0 END) # EXACT MATCH OF WHOLE NAME
      + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF WHOLE NAME
      + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT('%',@search,'%')) THEN 1 ELSE 0 END) # LIKE MATCH OF WHOLE NAME                             
      + (CASE WHEN (genus=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF genus
      + (CASE WHEN (species=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF species             
      + (CASE WHEN (genus LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF genus
      + (CASE WHEN (species LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) #MATCH BEGINNING OF species
             AS relevans
             FROM species 
             WHERE `fullname` LIKE CONCAT('%',@search,'%')
             ORDER BY relevans DESC, genus, species
             LIMIT 50;
    

    背景:物种名称至少由两部分组成,属和加词(在我的表中,加词列被命名为“物种”)。表格中有三列:属,种和全名。 “fullname”列还可以包含较低分类单元的名称(变量示例中的变体和形式)。我愿意就如何提高搜索效率提出任何建议。也许在搜索字符串上有一个正则表达式,只定位“fullname”列而不是连接两列?

    数据库模式示例:

    CREATE TABLE species
        (`id` int, `genus` varchar(50), `species` varchar(50), `fullname` varchar(100))
    ;
    
    INSERT INTO species
        (`id`, `genus`, `species`, `fullname`)
    VALUES
        (360052, 'Afroboletus', 'azureotinctus', 'Afroboletus azureotinctus'),
        (360053, 'Afroboletus', 'costatisporus', 'Afroboletus costatisporus'),
        (464267, 'Afroboletus', 'elegans', 'Afroboletus elegans'),
        (360054, 'Afroboletus', 'lepidellus', 'Afroboletus lepidellus'),
        (112100, 'Afroboletus', 'luteolus', 'Afroboletus luteolus'),
        (464266, 'Afroboletus', 'multijugus', 'Afroboletus multijugus'),
        (112101, 'Afroboletus', 'pterosporus', 'Afroboletus pterosporus'),
        (326826, 'Aureoboletus', 'auriporus', 'Aureoboletus auriporus'),
        (326828, 'Aureoboletus', 'gentilis', 'Aureoboletus gentilis'),
        (309389, 'Aureoboletus', 'novoguineensis', 'Aureoboletus novoguineensis'),
        (326829, 'Aureoboletus', 'subacidus', 'Aureoboletus subacidus'),
        (113146, 'Aureoboletus', 'thibetanus', 'Aureoboletus thibetanus'),
        (118425, 'Austroboletus', 'cookei', 'Austroboletus cookei'),
        (118427, 'Austroboletus', 'dictyotus', 'Austroboletus dictyotus'),
        (412550, 'Austroboletus', 'lacunosus', 'Austroboletus lacunosus'),
        (159051, 'Boletus', 'aereus', 'Boletus aereus'),
        (171640, 'Boletus', 'appendiculatus', 'Boletus appendiculatus'),
        (161237, 'Boletus', 'armeniacus', 'Boletus armeniacus'),
        (563944, 'Boletus', 'australiensis', 'Boletus australiensis'),
        (444094, 'Boletus', 'badius', 'Boletus badius'),
        (215376, 'Boletus', 'brunneus', 'Boletus brunneus'),
        (129701, 'Boletus', 'bubalinus', 'Boletus bubalinus'),
        (203954, 'Boletus', 'byssinus', 'Boletus byssinus'),
        (162779, 'Boletus', 'calopus', 'Boletus calopus'),
        (129469, 'Boletus', 'caucasicus', 'Boletus caucasicus'),
        (208740, 'Boletus', 'chrysenteron', 'Boletus chrysenteron'),
        (486540, 'Boletus', 'cisalpinus', 'Boletus cisalpinus'),
        (368037, 'Boletus', 'declivitatum', 'Boletus declivitatum'),
        (104061, 'Boletus', 'depilatus', 'Boletus depilatus'),
        (356530, 'Boletus', 'edulis', 'Boletus edulis'),
        (356278, 'Boletus', 'erythropus', 'Boletus erythropus var. immutatus'),
        (417068, 'Boletus', 'erythropus', 'Boletus erythropus var. erythropus'),
        (563943, 'Boletus', 'eximius', 'Boletus eximius'),
        (264716, 'Boletus', 'fechtneri', 'Boletus fechtneri'),
        (372473, 'Boletus', 'ferrugineus', 'Boletus ferrugineus'),
        (141943, 'Boletus', 'flavus', 'Boletus flavus'),
        (247434, 'Boletus', 'fragrans', 'Boletus fragrans'),
        (302971, 'Boletus', 'fuligineus', 'Boletus fuligineus'),
        (218213, 'Boletus', 'impolitus', 'Boletus impolitus'),
        (327048, 'Boletus', 'legaliae', 'Boletus legaliae'),
        (327051, 'Boletus', 'leptospermi', 'Boletus leptospermi'),
        (235486, 'Boletus', 'lignatilis', 'Boletus lignatilis'),
        (354822, 'Boletus', 'luridiformis', 'Boletus luridiformis var. junquilleus'),
        (354845, 'Boletus', 'luridiformis', 'Boletus luridiformis var. discolor'),
        (430254, 'Boletus', 'luridiformis', 'Boletus luridiformis var. luridiformis'),
        (132915, 'Boletus', 'luridus', 'Boletus luridus var. rubriceps'),
        (417113, 'Boletus', 'luridus', 'Boletus luridus var. luridus'),
        (241417, 'Boletus', 'megalosporus', 'Boletus megalosporus'),
        (282394, 'Boletus', 'moravicus', 'Boletus moravicus'),
        (196024, 'Boletus', 'paluster', 'Boletus paluster')
    ;
    

    1 个答案:

    答案 0 :(得分:2)

    我的建议是,忘记这个问题,创建一个fulltext index

    创建涵盖列genus,species和fullname(所有在一个索引中)的索引。然后像这样查询:

    SELECT * FROM your_table WHERE MATCH(genus, species, fullname) AGAINST ('Boletus a');
    

    您还可以在查询的其他部分使用MATCH(genus, species, fullname) AGAINST ('Boletus a')

    SELECT MATCH(genus, species, fullname) AGAINST ('Boletus a') #displays relevancy (a value between 0 and 1)
    FROM your_table 
    WHERE 
    MATCH(genus, species, fullname) AGAINST ('Boletus a') #filters (obviously)
    ORDER BY MATCH(genus, species, fullname) AGAINST ('Boletus a') #also obvious, orders by relevancy
    ;