Question

我想在默认的“ english ”中添加更多单词，例如“inc”，“incorporated”，“ltd”和“limited”。我怎样才能做到这一点？

我目前创建索引的代码如下。感谢。

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop": {
          "type": "stop",
          "stopwords": "_english_"
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
           "char_filter": [
            "html_strip"
          ],
          "filter": [ 
            "lowercase",
            "asciifolding",
            "my_stop"
          ]
        }
      }
    }
  }
}

我的测试代码

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "House of Dickson<br> corp"
}

Answer 1

我已经可以使用以下方法将自定义停用词与标准英语结合起来：

{
    "analysis": {
        "analyzer": {
            "my_analyzer": {
                "tokenizer": "standard",
                "filter": [
                    "custom_stop",
                    "english_stop"
                ]
            }
        },
        "filter": {
            "custom_stop": {
                "type":       "stop",
                "stopwords": ["custom1","custom2","custom3"]
            },
            "english_stop": {
                "type":       "stop",
                "stopwords":  "_english_"
            }
        }
    }
}

Answer 2

“ english ”一组停用词与Standard Analyzer中的设置相同。

您可以创建包含这些字词和其他停用词的文件，并使用stopwords_path选项指向此文件（而不是stopwords设置）：

{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop": {
          "type": "stop",
          "stopwords_path": "stopwords/custom_english.txt"
        }
      },
      ...
}

您可以在ES-docs中找到有关文件外观的更多信息（UTF-8，每行单字停字，所有节点上都有文件）。

如何在ElasticSearch

2 个答案: