在Solr中,SynonymFilterFactory扩展为分隔符+ WordDelimiterFilterFactory =>奇怪的结果查询

时间:2012-04-18 20:01:27

标签: solr lucene

这是我的查询分析器定义:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="companysyns.txt" ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>

在“companysyns.txt”中,我对典型的公司字词进行了一些扩展,例如:

inc, inc., incorporated

当我发送这样的查询时:

test:"some company inc"

我在solr debug中看到了这个意外的结果:

<str name="rawquerystring">test:"some company inc"</str>
<str name="querystring">test:"some company inc"</str>
<str name="parsedquery">
MultiPhraseQuery(test:"some company inc (inc incorporated)")
</str>
<str name="parsedquery_toString">test:"some company inc (inc incorporated)"</str>

与“Some Company,Inc。”不匹配。但是,如果我删除了WordDelimiterFilterFactory,那么对于同一个查询,我会看到:

<str name="rawquerystring">test:"some company inc"</str>
<str name="querystring">test:"some company inc"</str>
<str name="parsedquery">
MultiPhraseQuery(test:"some company (inc inc. incorporated)")
</str>
<str name="parsedquery_toString">test:"some company (inc inc. incorporated)"</str>

哪个匹配。

如果我保留WordDelimiterFilterFactory,请删除“inc。”来自同义词的条目(具有句点的条目),然后它也有效:

<str name="rawquerystring">test:"some company inc"</str>
<str name="querystring">test:"some company inc"</str>
<str name="parsedquery">
MultiPhraseQuery(test:"some company (inc incorporated)")
</str>
<str name="parsedquery_toString">test:"some company (inc incorporated)"</str>

知道为什么WordDelimiterFilterFactory弄乱同义词扩展查询?

谢谢!

1 个答案:

答案 0 :(得分:0)

WordDelimiterFilterFactory删除inc中的点。更改过滤器的顺序,它应该可以正常工作:

    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.SynonymFilterFactory" synonyms="companysyns.txt" ignoreCase="true" expand="true"/>