输出&#34; whs是&#34; - (73)这是&#34;谁是&#34; <的建议/ strong>与实际原始频率不同(94)
附上两张输出图像供参考
/spell?spellcheck.q="who is"
/spell?spellcheck.q="whs is"
的输出
任何使频率相同的方法
Schema.xml 看起来喜欢这个
<field name="gram" type="textSpell" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="gram_ci" type="textSpellCi" indexed="true" stored="false" multiValued="false"/>
<copyField source="gram" dest="gram_ci"/>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
</fieldType>
<fieldType name="textSpellCi" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
</fieldType>
solrconfig.xml 看起来像这样
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpellCi</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">gram_ci</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">0</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">2</int>
<float name="maxQueryFrequency">0.99</float>
<str name="comparatorClass">freq</str>
<float name="thresholdTokenFrequency">0.0</float>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">gram_ci</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">15</str>
<str name="spellcheck.alternativeTermCount">10</str>
<str name="spellcheck.onlyMorePopular">false</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
答案 0 :(得分:1)
我认为这些频率不是同一指数中该词的频率:
/spell?spellcheck.q="who是“ - &gt; “whs is”的频率是该术语在拼写检查器索引中的频率。
/spell?spellcheck.q="whs是“ - &gt; “whs is”的频率是一般Lucene指数中该术语的频率。
要拥有相同的频率,您必须在searchComponent中使用 solr.DirectSolrSpellChecker 而不是 solr.IndexBasedSpellChecker (我猜):
http://wiki.apache.org/solr/DirectSolrSpellChecker
编辑取决于您用于索引数据的方式。