Solr SnowballPorterFilterFactory过滤器提供了错误的建议

时间:2011-06-03 10:56:52

标签: solr spell-checking search-suggestion snowballanalyzer

我将SnowballPorterFilterFactory用于索引和查询分析器。 搜索“苹果”字样。 Solr成功地找到了必要的文章,但是单词拼写错误并提出建议:“appl”。 如果我搜索“苹果”,它的工作正确:没有给出任何建议,发现带有“苹果”字样的文章。

schema.xml中:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
  </analyzer>
</fieldType>

有关如何排除错误建议的任何想法?

1 个答案:

答案 0 :(得分:2)

您不应该使用相同的字段进行搜索&amp;拼写检查...添加一个没有词干的字段进行拼写检查。

示例:

<!-- Basic Text Field for use with Spell Correction -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<!-- TextSpell -->
<field name="textSpelling" type="textSpell" indexed="true" stored="false" multiValued="true"/>

然后在你的solrconfig.xml中:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">textSpelling</str>
      <str name="termSourceField">textSpelling</str>
      <str name="accuracy">0.7</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
      <str name="queryAnalyzerFieldType">text</str>
      <str name="buildOnOptimize">true</str>
    </lst>
</searchComponent>