我的弹性搜索数据以下列格式存储:
{
"person_name": "Abraham Benjamin deVilliers",
"name": "Abraham",
"office": {
"name": "my_office"
}
},
{
"person_name": "Johnny O'Ryan",
"name": "O'Ryan",
"office": {
"name": "Johnny O'Ryan"
}
},
......
我根据person_name
,name
和office.name
进行搜索匹配查询,如下所示:
{
"query": {
"multi_match" : {
"query": "O'Ryan",
"type": "best_fields",
"fields": [ "person_name", "name", "office.name" ],
"operator":"and"
}
}
}
它的工作正常,我得到的结果与查询字段完全匹配name
或person_name
或office.name
,如下所示。
{
"person_name": "Johnny O'Ryan",
"name": "O'Ryan",
"office": {
"name": "Johnny O'Ryan"
}
}
现在我想让搜索在用户传递查询字段ORyan
时返回相同的响应,而不是O'Ryan
,忽略存储结果中的Single quote (')
。
在进行弹性搜索查询时是否有办法执行此操作?或者在弹性搜索中存储数据时是否需要忽略特殊字符?
任何帮助将不胜感激。
答案 0 :(得分:1)
您正在寻找的是一个标记器:Tokenizers
在您的情况下,您可以尝试类似
的内容GET /_analyze
{
"tokenizer": "letter",
"filter":[],
"text" : "O'Ryan is good"
}
它将生成以下令牌:
{
"tokens": [
{
"token": "O",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "Ryan",
"start_offset": 2,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "is",
"start_offset": 7,
"end_offset": 9,
"type": "word",
"position": 2
},
{
"token": "good",
"start_offset": 10,
"end_offset": 14,
"type": "word",
"position": 3
}
]
}
更新
您还可以将名称字符过滤器添加到名称字段上使用的分析器(或单引号有问题的任何字段:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"' => "
]
}
}
}
}
}
如果你跑:
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "O'Bryan is a good"
}
你会得到:
{
"tokens": [
{
"token": "OBryan",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "a",
"start_offset": 11,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "good",
"start_offset": 13,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
}
]
}