SOLR自动完成中的挪威字母

时间:2016-09-24 10:05:11

标签: solr

我有一个自动完成索引,可将挪威字母翻译成国际对应字典,例如æ在结果集中翻译成ae。我怎样才能让它返回挪威字母呢?

可以通过输入“eksot”https://norecopa.no/search来测试它。第二个结果将是“eksotiske kjaeledyr”,应该是“eksotiskekjæledyr”

这是索引的定义:

<fieldtype name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="(^[^A-Za-z0-9]*|[^A-Za-z0-9]*$)" replacement=""  replace="all" />
        <filter class="solr.LengthFilterFactory" min="1" max="60" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="7" outputUnigrams="true" outputUnigramsIfNoShingles="true" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="99" outputUnigrams="false" outputUnigramsIfNoShingles="true" />
    </analyzer>
</fieldtype>

这是组件和respuest处理程序:

<searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
        <str name="field">text_sug</str>  <!-- the indexed field to derive suggestions from -->
        <float name="threshold">0.005</float>
        <str name="buildOnCommit">true</str>
        <str name="buildOnOptimize">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">text_suggester</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

1 个答案:

答案 0 :(得分:0)

您正在使用TSTLookupFactory,这是最简单的实现。有several more

其中之一,AnalyzingLookupFactory,使用单独的字段类型进行了额外的分析步骤。我相信如果您将角色映射移动到该步骤,您将匹配ascii表示但返回原始值。