索尔建议重复建议

时间:2015-05-08 08:20:45

标签: solr autosuggest search-suggestion

我正在尝试使用Solr(5)的建议。建议有效,但我得到反复的建议。 我试图在建议上使用分组,但它不起作用。 我如何防止反复出现的建议?

以下是 schema.xml 的必要部分:

<field name="Name" type="suggest" indexed="true" stored="true" multiValued="false"/>  
...
<fieldType name="suggest" class="solr.TextField">
  <analyzer type="index">        
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>             
        <filter class="solr.LowerCaseFilterFactory"/>           
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>              
  </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>      
        <filter class="solr.LowerCaseFilterFactory"/>           
      </analyzer>
</fieldType>

我的 solrconfig.xml

<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
  <str name="name">mySuggester</str>    
  <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
  <str name="suggestAnalyzerFieldType">suggest</str>      
  <str name="exactMatchFirst">true</str>
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>      
  <str name="field">Name</str>
  <str name="weightField">Price</str>      
  <str name="buildOnCommit">true</str>        
  <str name="buildOnStartup">false</str>
  <str name="preserveSep">false</str>    
</lst>  

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">   
  <str name="suggest">true</str>
  <str name="suggest.count">5</str>
  <str name="suggest.dictionary">mySuggester</str>
  <str name="suggest.collate">true</str>     
</lst>
<arr name="components">
  <str>suggest</str>
  <str>query</str>    
</arr>

&#34; acer&#34;的示例输出建议与参数

/suggest?&suggest.dictionary=mySuggester&suggest.q=acer

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<lst name="suggest">
<lst name="mySuggester">
<lst name="acer">
<int name="numFound">5</int>
<arr name="suggestions">
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2350</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2099</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2000</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>

你可以看到建议宏基V3-772G-5421121TMAKK英特尔酷睿i5 4210U 1.7GHz 12GB 1TB 17.3 &#34;三次。

分组也不起作用:

建议&安培; suggest.dictionary = mySuggester&安培; suggest.q =宏基&安培;基团=真安培; group.field =名称

 <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">90</int>
</lst>
<lst name="suggest">
<lst name="mySuggester">
<lst name="acer">
<int name="numFound">5</int>
<arr name="suggestions">
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2369</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-5421121TMAKK Intel Core i5 4210U 1.7GHz 12GB 1TB 17.3"
</str>
<long name="weight">2350</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2099</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">
<b>Acer</b> V3-772G-542081TMamm Intel Core i5 4200M 2.5GHz / 3.1GHz 8GB 1TB 17.3"
</str>
<long name="weight">2000</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
<lst name="grouped">
<lst name="Name">
<int name="matches">0</int>
<arr name="groups"/>
</lst>
</lst>
</response>

1 个答案:

答案 0 :(得分:2)

您正在使用 DocumentDictionaryFactory 字典实现。它将针对每个文档存储建议的术语。因此,如果多个文档中存在相同的建议术语,则将提供所有这些实例。

为了防止这种情况,您可以

  1. 编写拦截API,从Solr读取建议(例如:一次30个),然后在返回数据之前对其进行重复数据删除
  2. 使用其他词典,例如 FileDictionaryFactory HighFrequencyDictionaryFactory