搜索Solr ShingleFilterFactory

时间:2014-03-11 16:39:53

标签: search

我在Solr上有一个数据集合,我需要进行搜索并查找所有输入的单词。

例如,如果用户介绍文本" House Tree Spain" Solr应该寻找" House Tree Spain"," House Tree"," House Spain"," Tree Spain" " House"," Tree"," Spain"。

我正在使用" solr.ShingleFilterFactory"但就在我分析查询时。

<fieldType name="generic" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
   <tokenizer class="solr.StandardTokenizerFactory"/>

   <!-- generic -->       
   <filter class="solr.ASCIIFoldingFilterFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

   <!-- spanish -->
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" />

   <!-- english -->
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
  </analyzer>

  <analyzer type="query">
   <tokenizer class="solr.StandardTokenizerFactory"/>

   <!-- generic -->       
   <filter class="solr.ASCIIFoldingFilterFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

   <!-- spanish -->
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" />

   <!-- english -->
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />

   <filter class="solr.ShingleFilterFactory" maxShingleSize="10" outputUnigramsIfNoShingles="true"/>
 </analyzer>
</fieldType>

如何更改模式以获取我正在寻找的结果?

1 个答案:

答案 0 :(得分:0)

您必须将Shingle过滤器应用于查询和索引分析器。在索引阶段,它创建了令牌&#34; House Tree&#34;和&#34; Tree Spain&#34;,并将它们放入索引中。在查询阶段,它会从查询中创建这些标记,并在索引中搜索它们。如果省略这些步骤中的任何一个,那么&#34; House Tree&#34;永远不会匹配,看?

PS。木瓦大小10是巨大的。对于此特定示例,您只需要2.尽可能低地设置它,否则,您的索引大小会变得非常大。