在弹性搜索中进行搜索查询时,忽略存储数据中的特殊字符

时间:2018-04-24 07:14:24

标签: javascript node.js elasticsearch

我的弹性搜索数据以下列格式存储:

{
    "person_name": "Abraham Benjamin deVilliers",
    "name": "Abraham",
    "office": {
        "name": "my_office"
    }
},
{
    "person_name": "Johnny O'Ryan",
    "name": "O'Ryan",
    "office": {
        "name": "Johnny O'Ryan"
    }
},
......

我根据person_namenameoffice.name进行搜索匹配查询,如下所示:

{
  "query": {
    "multi_match" : {
      "query":      "O'Ryan",
      "type":       "best_fields",
      "fields":     [ "person_name", "name", "office.name" ],
      "operator":"and"
    }
  }
}

它的工作正常,我得到的结果与查询字段完全匹配nameperson_nameoffice.name,如下所示。

{
    "person_name": "Johnny O'Ryan",
    "name": "O'Ryan",
    "office": {
        "name": "Johnny O'Ryan"
    }
}

现在我想让搜索在用户传递查询字段ORyan时返回相同的响应,而不是O'Ryan,忽略存储结果中的Single quote (')

在进行弹性搜索查询时是否有办法执行此操作?或者在弹性搜索中存储数据时是否需要忽略特殊字符?

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

您正在寻找的是一个标记器:Tokenizers

在您的情况下,您可以尝试类似

的内容
GET /_analyze
{
  "tokenizer": "letter", 
  "filter":[],
  "text" : "O'Ryan is good"
}

它将生成以下令牌:

{
  "tokens": [
    {
      "token": "O",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "Ryan",
      "start_offset": 2,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "is",
      "start_offset": 7,
      "end_offset": 9,
      "type": "word",
      "position": 2
    },
    {
      "token": "good",
      "start_offset": 10,
      "end_offset": 14,
      "type": "word",
      "position": 3
    }
  ]
}

更新

您还可以将名称字符过滤器添加到名称字段上使用的分析器(或单引号有问题的任何字段:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "' => "
          ]
        }
      }
    }
  }
}

如果你跑:

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "O'Bryan is a good"
}

你会得到:

{
  "tokens": [
    {
      "token": "OBryan",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 8,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "a",
      "start_offset": 11,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "good",
      "start_offset": 13,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}