Question

我正在使用带有“José”等重音的字符索引网页上的所有名称。我希望能够用“Jose”和“José”搜索这个名字。

我应该如何设置一个带有一个字段“name”的简单索引的索引映射和分析器？

我为名称字段设置了一个分析器，如下所示：

"analyzer": {
  "folding": {
    "tokenizer": "standard",
    "filter": ["lowercase", "asciifolding"]
   }
 }

但是它将所有重音都折叠成ascii等值，并在索引“é”时忽略重音。我希望“é”字符在索引中，我希望能够用“José”或“Jose”搜索“José”

由于

Answer 1

您需要使用重音保留原始令牌。为此，您需要重新定义自己的asciifolding令牌过滤器，如下所示：

PUT /my_index
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "folding" : {
                    "tokenizer" : "standard",
                    "filter" : ["lowercase", "my_ascii_folding"]
                }
            },
            "filter" : {
                "my_ascii_folding" : {
                    "type" : "asciifolding",
                    "preserve_original" : true
                }
            }
        }
    },
    "mappings": {
        "my_type": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "folding"
                }
            }
        }
    }
}

之后，标记jose和josé将被编入索引并可搜索

Answer 2

这是我能想到的用变音符号来解决折叠问题：

Analyzer used:
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter":  [ "lowercase", "asciifolding" ]
        }
      }
    }
  }
}

以下是要使用的映射：

mappings used:
    {
      "properties": {
        "title": { 
          "type":           "string",
          "analyzer":       "standard",
          "fields": {
            "folded": { 
              "type":       "string",
              "analyzer":   "folding"
            }
          }
    }
  }
}

标题字段使用标准分析器，并包含带有变音符号的原始单词。
title.folded字段使用折叠分析器，它可以去除变音符号。

以下是我将使用的搜索查询：

{
  "query": {
    "multi_match": {
      "type":     "most_fields",
      "query":    "esta loca",
      "fields": [ "title", "title.folded" ]
    }
  }
}

在elasticsearch

2 个答案: