我在WDFF中关闭了splitOnNumrics和splitOnCaseChange,因为我不希望123ABC将包含123
和文档的文档与abc
匹配。我仍然想要查询" 123 abc"匹配包含123ABC
的文档。
我想追随自己:
123 abc - > 123abc
Wi-Fi - > wifi
产品12 - > product 12mm
目前的设置是:
<fieldType class="solr.TextField" name="title" omitNorms="false" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([mM]\d+)x?([\w]*)" replacement="$1 $2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="," replace="all" replacement="."/>
<filter class="solr.TrimFilterFactory"/>
<filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" maxSubwordSize="32" minSubwordSize="3" minWordSize="7"/>
<filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([mM]\d+)x?([\w]*)" replacement="$1 $2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="," replace="all" replacement="."/>
<filter class="solr.TrimFilterFactory"/>
<filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="0" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="solr.TextField" name="prefix_match" omitNorms="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.EdgeNGramTokenizerFactory" maxGramSize="50" minGramSize="2"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter catenateAll="1" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="0" generateWordParts="0" preserveOriginal="0" splitOnCaseChange="0" splitOnNumerics="0" types="wdfftypes.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
基本上我想用catenate单词和数字,任何想法如何实现这一目标?
我知道一个网站,他们可以匹配产品&#34;产品名称12毫米&#34;使用productname12mm
,productname 12mm
,productname 12
和productname12
。这就是我正在寻找的。