如何实施solr拼写检查?

时间:2011-06-18 06:45:36

标签: lucene solr spell-checking

我想使用solr在我的搜索应用程序中实现一个拼写检查器组件。需要更改哪些配置?

1 个答案:

答案 0 :(得分:2)

将以下部分添加到solrconfig.xml

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <lst name="spellchecker">
      <!--
           Optional, it is required when more than one spellchecker is configured.
           Select non-default name with spellcheck.dictionary in request handler.
      -->
      <str name="name">default</str>
      <!-- The classname is optional, defaults to IndexBasedSpellChecker -->
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <!--
               Load tokens from the following field for spell checking,
               analyzer for the field's type as defined in schema.xml are used
      -->
      <str name="field">spell</str>
      <!-- Optional, by default use in-memory index (RAMDirectory) -->
      <str name="spellcheckIndexDir">./spellchecker</str>
      <!-- Set the accuracy (float) to be used for the suggestions. Default is 0.5 -->
      <str name="accuracy">0.7</str>
      <!-- Require terms to occur in 1/100th of 1% of documents in order to be included in the dictionary -->
      <float name="thresholdTokenFrequency">.0001</float>
    </lst>
    <!-- Example of using different distance measure -->
    <lst name="spellchecker">
      <str name="name">jarowinkler</str>
      <str name="field">lowerfilt</str>
      <!-- Use a different Distance Measure -->
      <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
      <str name="spellcheckIndexDir">./spellchecker</str>

    </lst>

    <!-- This field type's analyzer is used by the QueryConverter to tokenize the value for "q" parameter -->
    <str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
<!--
  The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens.  Uses a simple regular expression
  to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser.

  Optional, defaults to solr.SpellingQueryConverter
-->
<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>

<!--  Add to a RequestHandler
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR
THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-->


<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
    <lst name="defaults">
      <!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
      <str name="spellcheck.dictionary">default</str>
      <!-- omp = Only More Popular -->
      <str name="spellcheck.onlyMorePopular">false</str>
      <!-- exr = Extended Results -->
      <str name="spellcheck.extendedResults">false</str>
      <!--  The number of suggestions to return -->
      <str name="spellcheck.count">1</str>
    </lst>
<!--  Add to a RequestHandler
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
REPEAT NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR
THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-->
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

来自Solr Wiki的配置示例, 添加此内容后,您可以请求构建拼写检查索引

http://localhost:8983/solr/spell?q=some query&spellcheck=true&spellcheck.collate=true&spellcheck.build=true 

请注意,不要在每个请求中包含查询的最后一部分,因为此woill会在您请求时构建拼写索引 前一个请求在第一个请求之后

http://localhost:8983/solr/spell?q=some query&spellcheck=true&spellcheck.collate=true

在之前的XML sextion中,别忘了用你想要构建拼写检查的字段替换字段拼写

现在你可以感受到拼写检查的力量