Question

我已经定义了如下同义词： facebook,fb,face book, face bk

现在，当我搜索facebook时，已解析的查询是

<str name="parsedquery_toString">
    text:facebook text:fb text:face text:face text:book text:bk
</str>

但如果我搜索脸书，那么解析的查询就是

<str name="parsedquery_toString">
    text:face text:book
</str>

两个关键字的解析查询不应该相同吗？

这是我的配置片段：

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>       
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

请找到synonym.txt的内容

#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
facebook,fb,face book, face bk
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.

# Synonym mappings can be used for spelling correction too
pixima => pixma

Answer 1

这是Solr / Lucene中一个众所周知的问题，您可以在以下网站找到更多相关信息：

the lucene ticket
this blog post，请参阅标题为多字同义词不会在查询中匹配的部分

如果你想解决这个问题，你有几个选择：

应用上述两个资源中提到的几个插件/解析器之一。作为一个缺点，每次升级solr等时都必须重做工作。
将同义词移动到索引时间。无论如何，这是首选，尽管它有其自身的缺点。

多期solr同义词问题

1 个答案: