Catenate on numircs

时间:2017-03-23 15:50:55

标签: solr

我在WDFF中关闭了splitOnNumrics和splitOnCaseChange,因为我不希望123ABC将包含123 文档的文档与abc匹配。我仍然想要查询" 123 abc"匹配包含123ABC的文档。

我想追随自己:

123 abc - > 123abc

Wi-Fi - > wifi

产品12 - > product 12mm

目前的设置是:

    <fieldType class="solr.TextField" name="title" omitNorms="false" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([mM]\d+)x?([\w]*)" replacement="$1 $2"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="," replace="all" replacement="."/>
    <filter class="solr.TrimFilterFactory"/>
    <filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>   
    <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" maxSubwordSize="32" minSubwordSize="3" minWordSize="7"/> 
    <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([mM]\d+)x?([\w]*)" replacement="$1 $2"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="," replace="all" replacement="."/>
    <filter class="solr.TrimFilterFactory"/>        
    <filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="0" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>   
    <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords.txt"/>
    <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>   
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType class="solr.TextField" name="prefix_match" omitNorms="true" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.EdgeNGramTokenizerFactory" maxGramSize="50" minGramSize="2"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="0" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

基本上我想用catenate单词和数字,任何想法如何实现这一目标?

我知道一个网站,他们可以匹配产品&#34;产品名称12毫米&#34;使用productname12mmproductname 12mmproductname 12productname12。这就是我正在寻找的。

0 个答案:

没有答案