我已经按照Solr文档中的拼写检查示例。
我使用的配置:
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name_spell</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents -->
<!-- <float name="thresholdTokenFrequency">.01</float> -->
</lst>
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name_spell</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
</searchComponent>
处理程序:
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck_new</str>
</arr>
</requestHandler>
架构字段:
<field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" />
<field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/>
<copyField source="attribute_key" dest="spell_check_field" />
<field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/>
<copyField source="attribute_key" dest="name_spell" />
<field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/>
<copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/>
<field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" />
<defaultSearchField>attribute_key</defaultSearchField>
我看到这些建议完美无缺。但是对于所有查询,collations数组总是为空。
Ex查询:
http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
结果:
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 60
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"spellcheck": {
"suggestions": [
"nike",
{
"numFound": 6,
"startOffset": 0,
"endOffset": 4,
"origFreq": 2,
"suggestion": [
{
"word": "n i k e",
"freq": 19
},
{
"word": "nine",
"freq": 1
},
{
"word": "none",
"freq": 29
},
{
"word": "note",
"freq": 5
},
{
"word": "nicka",
"freq": 2
},
{
"word": "nino",
"freq": 2
}
]
},
"shoes",
{
"numFound": 10,
"startOffset": 5,
"endOffset": 10,
"origFreq": 0,
"suggestion": [
{
"word": "shoe",
"freq": 30
},
{
"word": "shoe s",
"freq": 30
},
{
"word": "short",
"freq": 30
},
{
"word": "s h o e s",
"freq": 4
},
{
"word": "sheer",
"freq": 15
},
{
"word": "sheen",
"freq": 4
},
{
"word": "sheet",
"freq": 3
},
{
"word": "shower",
"freq": 2
},
{
"word": "shock",
"freq": 1
},
{
"word": "shred",
"freq": 1
}
]
}
],
"correctlySpelled": false,
"collations": []
}
}
如何设置排序规则?
答案 0 :(得分:0)
让我们先来看看SpellCheck Collate
文档中的定义使Solr根据每个查询的最佳建议构建新查询 提交的查询中的术语。
长话短说,当你指定spellcheck.collate = true时,你要求Solr推荐一个你可以重新执行的新查询,并且会比你收到的建议更好。让我向您展示几个例子。
初始审核
initila audti
<lst name="suggestions">
<lst name="initila">
<int name="numFound">5</int>
<int name="startOffset">1</int>
<int name="endOffset">8</int>
<arr name="suggestion">
<str>initial</str>
<str>initi la</str>
<str>initiala</str>
<str>ini tila</str>
<str>initilal</str>
</arr>
</lst>
<lst name="audt">
<int name="numFound">4</int>
<int name="startOffset">9</int>
<int name="endOffset">13</int>
<arr name="suggestion">
<str>aud t</str>
<str>audit</str>
<str>au dt</str>
<str>audi</str>
</arr>
</lst>
</lst>
这意味着每个单词会有几个建议
但是如果你打开校对,你最有可能 - 如果有的话 - 建议应该执行什么查询。虽然不能保证它是最好的,但可以认为这是一个可以帮助你的好猜测
<lst name="suggestions">
<lst name="initila">
<int name="numFound">5</int>
<int name="startOffset">1</int>
<int name="endOffset">8</int>
<arr name="suggestion">
<str>initial</str>
<str>initi la</str>
<str>initiala</str>
<str>ini tila</str>
<str>initilal</str>
</arr>
</lst>
<lst name="audti">
<int name="numFound">5</int>
<int name="startOffset">9</int>
<int name="endOffset">14</int>
<arr name="suggestion">
<str>audit</str>
<str>audt i</str>
<str>auditi</str>
<str>au dti</str>
<str>audtis</str>
</arr>
</lst>
<lst name="collation">
<str name="collationQuery">initial audit</str>
<int name="hits">1983</int>
<lst name="misspellingsAndCorrections">
<str name="initila">initial</str>
<str name="audti">audit</str>
</lst>
</lst>
</lst>
这将是推荐的查询
初始审核
从这里获得
<str name="collationQuery">initial audit</str>
如果索引中的推荐查询满足您的要求,则排序规则才有效
答案 1 :(得分:0)
以下方法解决了我的问题:
Cannot find module 'crawler'
添加默认字段下作为crawler
列表的子项,即requestHandler
。现在执行查询会得到defaults
个结果。这里可以使用任何<str name="df">name_spell</str>
或collations
。或强>
q
代替spellcheck.q
并在使用q
时指定字段,而不是spellcheck.q
使用q
,它会给spellcheck.q=nike%20shoes
结果。