Solr Collat​​ion如何工作

时间:2017-02-18 18:57:28

标签: solr spell-checking

我已经按照Solr文档中的拼写检查示例。

我使用的配置:

<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
  <str name="name">default</str>
  <str name="field">name_spell</str>
  <str name="classname">solr.DirectSolrSpellChecker</str>
  <!-- the spellcheck distance measure used, the default is the internal levenshtein -->
  <str name="distanceMeasure">internal</str>
  <!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
  <float name="accuracy">0.5</float>
  <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
  <int name="maxEdits">2</int>
  <!-- the minimum shared prefix when enumerating terms -->
  <int name="minPrefix">1</int>
  <!-- maximum number of inspections per result. -->
  <int name="maxInspections">5</int>
  <!-- minimum length of a query term to be considered for correction -->
  <int name="minQueryLength">4</int>
  <!-- maximum threshold of documents a query term can appear to be considered for correction -->
  <float name="maxQueryFrequency">0.01</float>
  <!-- uncomment this to require suggestions to occur in 1% of the documents -->
    <!-- <float name="thresholdTokenFrequency">.01</float> -->

</lst>
<lst name="spellchecker">
  <str name="name">wordbreak</str>
  <str name="classname">solr.WordBreakSolrSpellChecker</str>      
  <str name="field">name_spell</str>
  <str name="combineWords">true</str>
  <str name="breakWords">true</str>
  <int name="maxChanges">10</int>
</lst>
</searchComponent>

处理程序:

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>       
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>  
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>         
    </lst>
    <arr name="last-components">
      <str>spellcheck_new</str>
    </arr>
  </requestHandler>

架构字段:

    <field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" />
    <field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/>
    <copyField source="attribute_key" dest="spell_check_field" />
    <field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/>
    <copyField source="attribute_key" dest="name_spell" />
    <field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/>
    <copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/>
    <field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" />
    <defaultSearchField>attribute_key</defaultSearchField>

我看到这些建议完美无缺。但是对于所有查询,collat​​ions数组总是为空。

Ex查询:

http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true

结果:

{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 60
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"spellcheck": {
"suggestions": [
"nike",
{
"numFound": 6,
"startOffset": 0,
"endOffset": 4,
"origFreq": 2,
"suggestion": [
{
"word": "n i k e",
"freq": 19
},
{
"word": "nine",
"freq": 1
},
{
"word": "none",
"freq": 29
},
{
"word": "note",
"freq": 5
},
{
"word": "nicka",
"freq": 2
},
{
"word": "nino",
"freq": 2
}
]
},
"shoes",
{
"numFound": 10,
"startOffset": 5,
"endOffset": 10,
"origFreq": 0,
"suggestion": [
{
"word": "shoe",
"freq": 30
},
{
"word": "shoe s",
"freq": 30
},
{
"word": "short",
"freq": 30
},
{
"word": "s h o e s",
"freq": 4
},
{
"word": "sheer",
"freq": 15
},
{
"word": "sheen",
"freq": 4
},
{
"word": "sheet",
"freq": 3
},
{
"word": "shower",
"freq": 2
},
{
"word": "shock",
"freq": 1
},
{
"word": "shred",
"freq": 1
}
]
}
],
"correctlySpelled": false,
"collations": []
}
}

如何设置排序规则?

2 个答案:

答案 0 :(得分:0)

让我们先来看看SpellCheck Collate

文档中的定义
  

使Solr根据每个查询的最佳建议构建新查询   提交的查询中的术语。

长话短说,当你指定spellcheck.collat​​e = true时,你要求Solr推荐一个你可以重新执行的新查询,并且会比你收到的建议更好。让我向您展示几个例子。

  • 让我们说你要搜索
  

初始审核

  • 无论出于何种原因,它被输入为
  

initila audti

  • 如果整理错误,您将收到以下拼写检查建议

    <lst name="suggestions">
        <lst name="initila">
            <int name="numFound">5</int>
            <int name="startOffset">1</int>
            <int name="endOffset">8</int>
            <arr name="suggestion">
                <str>initial</str>
                <str>initi la</str>
                <str>initiala</str>
                <str>ini tila</str>
                <str>initilal</str>
            </arr>
        </lst>
        <lst name="audt">
            <int name="numFound">4</int>
            <int name="startOffset">9</int>
            <int name="endOffset">13</int>
            <arr name="suggestion">
                <str>aud t</str>
                <str>audit</str>
                <str>au dt</str>
                <str>audi</str>
            </arr>
        </lst>
    </lst>

这意味着每个单词会有几个建议

  • 但是如果你打开校对,你最有可能 - 如果有的话 - 建议应该执行什么查询。虽然不能保证它是最好的,但可以认为这是一个可以帮助你的好猜测

    <lst name="suggestions">
        <lst name="initila">
            <int name="numFound">5</int>
            <int name="startOffset">1</int>
            <int name="endOffset">8</int>
            <arr name="suggestion">
                <str>initial</str>
                <str>initi la</str>
                <str>initiala</str>
                <str>ini tila</str>
                <str>initilal</str>
            </arr>
        </lst>
        <lst name="audti">
            <int name="numFound">5</int>
            <int name="startOffset">9</int>
            <int name="endOffset">14</int>
            <arr name="suggestion">
                <str>audit</str>
                <str>audt i</str>
                <str>auditi</str>
                <str>au dti</str>
                <str>audtis</str>
            </arr>
        </lst>
        <lst name="collation">
            <str name="collationQuery">initial audit</str>
            <int name="hits">1983</int>
            <lst name="misspellingsAndCorrections">
                <str name="initila">initial</str>
                <str name="audti">audit</str>
            </lst>
        </lst>
    </lst>
    

这将是推荐的查询

  

初始审核

从这里获得

<str name="collationQuery">initial audit</str>

如果索引中的推荐查询满足您的要求,则排序规则才有效

答案 1 :(得分:0)

以下方法解决了我的问题:

  1. Cannot find module 'crawler'添加默认字段下作为crawler列表的子项,即requestHandler。现在执行查询会得到defaults个结果。这里可以使用任何<str name="df">name_spell</str>collations
    1. 使用q代替spellcheck.q并在使用q时指定字段,而不是spellcheck.q使用q,它会给spellcheck.q=nike%20shoes结果。