Solr同义词替换失败了吗?

时间:2011-11-17 10:19:45

标签: solr

我使用同义词文件 SynonymFilterFactory 。来自Solr文档:

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

但是,在查询sea biscuit时,我最终会得到与seabiscuitseabiscuit相关的结果。

这是好像我具有以下配置(expand="true"):

sea biscuit, sea biscit, seabiscuit

我不理解这种行为,因为在Solr分析工具中,查询sea biscuit时,它仅被seabiscuit正确替换。

换句话说:=>的显式同义词映射不起作用


编辑:字段配置

标记:true

班级名称:org.apache.solr.schema.TextField

索引分析器:org.apache.solr.analysis.TokenizerChain

  • Tokenizer类:org.apache.solr.analysis.WhitespaceTokenizerFactory

过滤器:

org.apache.solr.analysis.StopFilterFactory args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}

查询分析器:org.apache.solr.analysis.TokenizerChain

  • Tokenizer类:org.apache.solr.analysis.WhitespaceTokenizerFactory

过滤器:

org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SynonymFilterFactory args:{expand: true ignoreCase: true synonyms: synonyms.txt }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 0 catenateNumbers: 0 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}

2 个答案:

答案 0 :(得分:1)

SynonymFilterFactory已弃用,现在应替换为SynonymGraphFilterFactory。当在同一位置存在多个令牌时,它会压缩令牌并修复多字同义词的问题。

答案 1 :(得分:0)

你在做短语查询(使用双引号)吗? 如果没有,您将向SynonymFilter(海和饼干)提供两种不同的令牌。在这种情况下,找不到匹配的同义词。

顺便说一句,在索引时处理同义词几乎总是更好的主意。看这里:http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory