Question

我的弹性搜索数据以下列格式存储：

{
    "person_name": "Abraham Benjamin deVilliers",
    "name": "Abraham",
    "office": {
        "name": "my_office"
    }
},
{
    "person_name": "Johnny O'Ryan",
    "name": "O'Ryan",
    "office": {
        "name": "Johnny O'Ryan"
    }
},
......

我根据person_name，name和office.name进行搜索匹配查询，如下所示：

{
  "query": {
    "multi_match" : {
      "query":      "O'Ryan",
      "type":       "best_fields",
      "fields":     [ "person_name", "name", "office.name" ],
      "operator":"and"
    }
  }
}

它的工作正常，我得到的结果与查询字段完全匹配name或person_name或office.name，如下所示。

{
    "person_name": "Johnny O'Ryan",
    "name": "O'Ryan",
    "office": {
        "name": "Johnny O'Ryan"
    }
}

现在我想让搜索在用户传递查询字段ORyan时返回相同的响应，而不是O'Ryan，忽略存储结果中的Single quote (')。

在进行弹性搜索查询时是否有办法执行此操作？或者在弹性搜索中存储数据时是否需要忽略特殊字符？

任何帮助将不胜感激。

Answer 1

您正在寻找的是一个标记器：Tokenizers

在您的情况下，您可以尝试类似

的内容

GET /_analyze
{
  "tokenizer": "letter", 
  "filter":[],
  "text" : "O'Ryan is good"
}

它将生成以下令牌：

{
  "tokens": [
    {
      "token": "O",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "Ryan",
      "start_offset": 2,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "is",
      "start_offset": 7,
      "end_offset": 9,
      "type": "word",
      "position": 2
    },
    {
      "token": "good",
      "start_offset": 10,
      "end_offset": 14,
      "type": "word",
      "position": 3
    }
  ]
}

更新

您还可以将名称字符过滤器添加到名称字段上使用的分析器（或单引号有问题的任何字段：

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "' => "
          ]
        }
      }
    }
  }
}

如果你跑：

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "O'Bryan is a good"
}

你会得到：

{
  "tokens": [
    {
      "token": "OBryan",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 8,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "a",
      "start_offset": 11,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "good",
      "start_offset": 13,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

在弹性搜索中进行搜索查询时，忽略存储数据中的特殊字符

1 个答案: