Question

我试图猜测elasticsearch中标准分析器中的默认停用词列表是什么。我运行版本1.3.1，在我看来，使用英文列表，因为运行这样的通配符查询

{
      "wildcard" : {
        "name" : {
          "wildcard" : "*in*"
        }
      }
}

没有给我任何结果（我确定文件名包含＆＃34;在＆＃34;，并且在使用not_analyzed映射时会返回它们）。但是，在1.0 breaking changes上，他们说默认值现在为空，Standard Analyzer documentation for the latest version中也说明了相同的内容。另一方面，当点击给定链接以获取更多详细信息时，我最终到Stop Analyzer documentation，说默认仍然是英语。

任何帮助？感谢

Answer 1

这将是标准分析器的停用词列表：http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/core/StopAnalyzer.java?av=f#50

50   static {
51     final List<String> stopWords = Arrays.asList(
52       "a", "an", "and", "are", "as", "at", "be", "but", "by",
53       "for", "if", "in", "into", "is", "it",
54       "no", "not", "of", "on", "or", "such",
55       "that", "the", "their", "then", "there", "these",
56       "they", "this", "to", "was", "will", "with"
57     );
58     final CharArraySet stopSet = new CharArraySet(Version.LUCENE_CURRENT, 
59         stopWords, false);
60     ENGLISH_STOP_WORDS_SET = CharArraySet.unmodifiableSet(stopSet); 
61   }

standard的{Elasticsearch'源代码：https://github.com/elastic/elasticsearch/blob/v1.3.1/src/main/java/org/elasticsearch/index/analysis/StandardAnalyzerProvider.java#L47

指向Lucene StandardAnalyzer的链接，后者依次引用StopAnalyzer的停用词列表：http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java?av=f#63

Elasticsearch标准分析器停用词

1 个答案: