Question

在Elasticsearch 2.x中，我如何在我的analzyer中使用“小写”过滤器时将首字母缩略词“CAN”与常用英文单词“can”区分开来（因此搜索不区分大小写）？

我使用的自定义分析器是：

"analyzer": {
    "tight": {
        "type": "custom",
        "tokenizer": "standard",
        "stopwords": "_english_",
        "filter": ["lowercase", "asciifolding"]
    }
}

在索引时，当大写的首字母缩写“CAN”命中我的分析器时，它会变成英文单词“can”。然后当我搜索“CAN”时，我得到所有文件中都有英文单词“can”。我只想要包含大写单词“CAN”的文档。可能还有其他缩略词属于类似的模式。

解决这个问题的最佳方法是什么？

Answer 1

实现它的一种方法是创建另一个没有lowercase令牌过滤器的分析器，并在主域的子域上使用该分析器。它是这样的：

使用两个分析器tight和tight_acronym创建索引。前者分配给field，后者分配给field.acronyms子字段：

PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "tight": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "tight_acronym": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "tight",
          "fields": {
            "acronyms": {
              "type": "string",
              "analyzer": "tight_acronym"
            }
          }
        }
      }
    }
  }
}

然后我们索引两个文件：

PUT index/test/1
{ "field": "It is worth CAN 300" }
PUT index/test/2
{ "field": "can you do it?" }

然后，如果您搜索CAN（在子字段上），您将获得第一个文档

POST index/test/_search
{
  "query": {
    "match": {
      "field.acronyms": "CAN"
    }
  }
}

如果您搜索can（在主要字段上），您将获得第二个文档

POST index/test/_search
{
  "query": {
    "match": {
      "field": "can"
    }
  }
}

Elasticsearch - 如何在使用小写过滤器时保留大写首字母缩略词？

1 个答案: