当我在solr中查询“优雅”时,我也得到了“优雅”的结果。
我使用这些过滤器进行索引分析
WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory
和查询分析:
WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
我想知道哪个过滤器会影响我的搜索结果。
答案 0 :(得分:0)
<强> EnglishPorterFilterFactory 强>
简短的回答;)
更多信息:
英语波特语是指英语搬运工干扰词汇算法。根据词干(这是一个启发式词根构建者)的优雅和优雅,同样的词干。
您可以在线验证此内容,例如Here。基本上你会看到“eleg ant ”和“eleg ance ”源于同一个词干&gt;的 ELEG 强>
来自Solr来源:
public void inform(ResourceLoader loader) {
String wordFiles = args.get(PROTECTED_TOKENS);
if (wordFiles != null) {
try {
这里正好出现了protwords文件:
File protectedWordFiles = new File(wordFiles);
if (protectedWordFiles.exists()) {
List<String> wlist = loader.getLines(wordFiles);
//This cast is safe in Lucene
protectedWords = new CharArraySet(wlist, false);//No need to go through StopFilter as before, since it just uses a List internally
} else {
List<String> files = StrUtils
.splitFileNames(wordFiles);
for (String file : files) {
List<String> wlist = loader.getLines(file
.trim());
if (protectedWords == null)
protectedWords = new CharArraySet(wlist,
false);
else
protectedWords.addAll(wlist);
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
这是影响堵塞的部分。在那里你看到了雪球库的调用
public EnglishPorterFilter create(TokenStream input) {
return new EnglishPorterFilter(input, protectedWords);
}
}
/**
* English Porter2 filter that doesn't use reflection to
* adapt lucene to the snowball stemmer code.
*/
@Deprecated
class EnglishPorterFilter extends SnowballPorterFilter {
public EnglishPorterFilter(TokenStream source,
CharArraySet protWords) {
super (source, new org.tartarus.snowball.ext.EnglishStemmer(),
protWords);
}
}