Question

当我在solr中查询“优雅”时，我也得到了“优雅”的结果。

我使用这些过滤器进行索引分析

WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory

和查询分析：

WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory

我想知道哪个过滤器会影响我的搜索结果。

Answer 1

<强> EnglishPorterFilterFactory

简短的回答;）

更多信息：

英语波特语是指英语搬运工干扰词汇算法。根据词干（这是一个启发式词根构建者）的优雅和优雅，同样的词干。

您可以在线验证此内容，例如Here。基本上你会看到“eleg ant ”和“eleg ance ”源于同一个词干＆gt;的 ELEG

来自Solr来源：

public void inform(ResourceLoader loader) { String wordFiles = args.get(PROTECTED_TOKENS); if (wordFiles != null) { try {

这里正好出现了protwords文件：

File protectedWordFiles = new File(wordFiles); if (protectedWordFiles.exists()) { List<String> wlist = loader.getLines(wordFiles); //This cast is safe in Lucene protectedWords = new CharArraySet(wlist, false);//No need to go through StopFilter as before, since it just uses a List internally } else { List<String> files = StrUtils .splitFileNames(wordFiles); for (String file : files) { List<String> wlist = loader.getLines(file .trim()); if (protectedWords == null) protectedWords = new CharArraySet(wlist, false); else protectedWords.addAll(wlist); } } } catch (IOException e) { throw new RuntimeException(e); } } }

这是影响堵塞的部分。在那里你看到了雪球库的调用

public EnglishPorterFilter create(TokenStream input) { return new EnglishPorterFilter(input, protectedWords); } } /** * English Porter2 filter that doesn't use reflection to * adapt lucene to the snowball stemmer code. */ @Deprecated class EnglishPorterFilter extends SnowballPorterFilter { public EnglishPorterFilter(TokenStream source, CharArraySet protWords) { super (source, new org.tartarus.snowball.ext.EnglishStemmer(), protWords); } }

过滤器对solr中搜索结果的影响

1 个答案: