我试图追踪一个奇怪的SOLR SynonymnFilterFactory问题。在具有此配置的Solr 4.6.1和4.7中:
<fieldType name="text_buggy" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
使用这个synonyms.txt条目:
wbc,白血球计数
字符串的字段分析输出&#34; due to elevated wbc patient was placed on medication
&#34;在solr admin中显示以下内容:
ST | due | to | elevated | wbc | patient | was | placed | on | medication
SF | due | to | elevated | wbc | white | patient | blood | was | count | placed | on | medication
但为什么它不像下面那样?由于上述原因,我得到了一些奇怪的搜索结果:
ST | due | to | elevated | wbc | patient | was | placed | on | medication
SF | due | to | elevated | wbc | white | blood | count | patient | was | placed | on | medication
更新 阅读this SOLR bug之后,我确实发现通过将我改为LUCENE_33,我得到了更好的结果(可能是我记得过去的结果):
ST | due | to | elevated | wbc | patient | was | placed | on | medication
SF | due | to | elevated | wbc | white | white | count | blood | count | patient | was | placed | on | medication