狮身人面像搜索停用词和带连字符的短语

时间:2015-12-01 15:36:57

标签: indexing full-text-search sphinx

遇到一个奇怪的问题,Sphinx和带有连字符的短语中的停用词。我有大量的Sphinx索引运行后端搜索文章。索引的属性包括文章的URL slug。我正在使用inflex搜索slug,以便用户更容易找到旧项目。

这是一个问题,给出了这样的slu ::

an-optional-text-string

搜索全文的用户不会收到任何结果。但是如果你删除“an-”并只使用“optional-text-string”或只是“optional-text”或“text-string”,文档将按预期返回。

我认为这可能是一个关键词问题?也许Sphinx索引器正在删除“an-”位,但搜索查询解析器不是?

还有其他人遇到过这个吗?

这是我的源和索引配置的简化版本

source articleSource
{
    type            = mysql

    sql_host        = 
    sql_user        = 
    sql_pass        = 
    sql_db          = 

    sql_query_pre = SET NAMES utf8
    sql_query_pre = DELETE FROM foundry_registry where name='__index_articles'
    sql_query_pre = INSERT INTO foundry_registry (SELECT (SELECT MAX(uid)+1 from foundry_registry), '__index_articles', MAX(modified), 0, '__core' FROM gryphon_articles)

    sql_query       = \
        select a.uid as id, a.uid as item_id, a.headline as title, a.abstract as description, a.copy, \
            a.created, a.published, a.modified, a.status, 'article' as type, a.slug as url_slug, \
            group_concat(t.name) as tags, \
            group_concat(au.name) as authors, \
            a.workflow_id, a.section_id, a.issue_id, '0' as blog_id \
            from gryphon_articles as a \
            left join gryphon_articlesTags as at on at.article_id = a.uid \
            left join gryphon_tags as t on at.tag_id = t.uid \
            left join gryphon_articlesAuthors as aa on aa.article_id = a.uid \
            left join gryphon_authors as au on aa.author_id = au.uid \
            group by a.uid


    sql_attr_multi          = uint tag from query; SELECT article_id, tag_id as tag from gryphon_articlesTags
    sql_attr_multi          = uint author from query; SELECT article_id, author_id as author from gryphon_articlesAuthors

    sql_field_string = title
    sql_field_string = description
    sql_field_string = copy
    sql_field_string = type
    sql_field_string = url_slug
    sql_field_string = tags
    sql_field_string = authors
    sql_field_string = workflow_id
    sql_field_string = section_id
    sql_field_string = issue_id

    sql_attr_uint = item_id
    sql_attr_uint = created
    sql_attr_uint = published
    sql_attr_uint = modified
    sql_attr_uint = blog_id

    sql_attr_bool = status

    sql_query_info      = SELECT * FROM gryphon_articles WHERE uid=$id

}

index articleIndex
{
    source          = articleSource

    path            = /path/to/index

    docinfo         = extern

    mlock           = 0

    morphology      = stem_en

    min_word_len        = 1

    charset_type        = utf-8

    enable_star     = 1

    html_strip      = 0
    html_remove_elements    = style, script

    min_infix_len   = 3
    infix_fields    = url_slug
}

0 个答案:

没有答案