SolR:完整的句子拼写检查

时间:2011-08-18 15:28:56

标签: solr full-text-search spell-checking

我正在尝试配置拼写检查程序以自动完成查询中的完整句子。

我已经能够得到这样的结果:

"american israel" :
-> "american something"
-> "israel something"

但我想:

"american israel" :
-> "american israel something"

这是我的solrconfig.xml:

<searchComponent name="suggest_full" class="solr.SpellCheckComponent">
 <str name="queryAnalyzerFieldType">suggestTextFull</str>
 <lst name="spellchecker">
  <str name="name">suggest_full</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  <str name="field">text_suggest_full</str>
  <str name="fieldType">suggestTextFull</str>
 </lst>
</searchComponent>

<requestHandler name="/suggest_full" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
 <str name="echoParams">explicit</str>
 <str name="spellcheck">true</str>
 <str name="spellcheck.dictionary">suggest_full</str>
 <str name="spellcheck.count">10</str>
 <str name="spellcheck.onlyMorePopular">true</str>
</lst>
<arr name="last-components">
 <str>suggest_full</str>
</arr>
</requestHandler>

这是我的schema.xml:

<fieldType name="suggestTextFull" class="solr.TextField">
  <analyzer type="index">  
    <tokenizer class="solr.KeywordTokenizerFactory"/>  
    <filter class="solr.LowerCaseFilterFactory"/>  
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">  
    <tokenizer class="solr.KeywordTokenizerFactory"/>  
    <filter class="solr.LowerCaseFilterFactory"/>  
  </analyzer>
</fieldType>

...

<field name="text_suggest_full" type="suggestTextFull" indexed="true" stored="false" multiValued="true"/>

我在某处读过我必须使用 spellcheck.q ,因为q使用WhitespaceAnalyzer,但是当我使用spellcheck.q时,我得到了一个java.lang.NullPointerException

有什么想法吗?

3 个答案:

答案 0 :(得分:1)

如果拼写检查字段(text_suggest_full)包含american somethingisrael something,请确保还存在值为american israel something的文档/条目。

Solr不会将american somethingisrael something合并为一个字词,也不会将结果应用于american israel的拼写检查。

答案 1 :(得分:0)

不存在更适合的自动填充方法吗?请参阅this文章,例如

答案 2 :(得分:0)

您可以使用建议者/灵活的“自动完成”组件; 你必须有solx的版本3.X

SolrConfig.xml:

 <searchComponent name="suggest" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
    <str name="name">suggest</str>
    <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
    <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
    <str name="field">name_autocomplete</str>
    </lst>
    </searchComponent>


    <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
    <str name="spellcheck">true</str>
    <str name="spellcheck.dictionary">suggest</str>
    <str name="spellcheck.count">10</str>
    </lst>
    <arr name="components">
    <str>suggest</str>
    </arr>
    </requestHandler>

Shema.xml

<field name="name_autocomplete" type="text" indexed="true" stored="true" multiValued="false" />

添加copyField

<copyField source="name" dest="name_autocomplete" />

重新加载solr,重新索引所有并测试 http://localhost:8983/solr/suggest?q=&amerspellcheck=true&spellcheck.collate=true&spellcheck.build=true

获取类似内容:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="spellcheck">
    <lst name="suggestions">
      <lst name="ameri">
        <int name="numFound">2</int>
        <int name="startOffset">0</int>
        <int name="endOffset">2</int>
        <arr name="suggestion">
          <str>american morocco</str>
          <str>american morocco something</str>
        </arr>
      </lst>
      <str name="collation">american morocco something</str>
    </lst>
  </lst>
</response>

希望有帮助

干杯