在elasticsearch mapping的映射char_filter部分,它有点模糊,我很难理解是否以及如何使用charfilter分析器:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html
基本上,我们在索引中存储的数据是String
类型的ID,如下所示:"008392342000"
。我希望能够在查询字词实际包含连字符或尾随空格时搜索此类ID,如下所示:"008392342-000 "
。
您如何建议我将分析仪设置为? 目前这是该领域的定义:
"mappings": {
"client": {
"properties": {
"ucn": {
"type": "multi_field",
"fields": {
"ucn_autoc": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"ucn": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
以下是包含分析器等的索引的设置。
"settings": {
"analysis": {
"filter": {
"autocomplete_ngram": {
"max_gram": 15,
"min_gram": 1,
"type": "edge_ngram"
},
"ngram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 8
}
},
"analyzer": {
"lowercase_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
},
"autocomplete_index": {
"filter": [
"lowercase",
"autocomplete_ngram"
],
"tokenizer": "keyword"
},
"ngram_index": {
"filter": [
"ngram_filter",
"lowercase"
],
"tokenizer": "keyword"
},
"autocomplete_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
},
"ngram_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
},
"index": {
"number_of_shards": 6,
"number_of_replicas": 1
}
}
}
答案 0 :(得分:4)
您尚未提供实际的分析仪,数据输入内容以及您的期望值,但根据您提供的信息,我将从此开始:
{
"settings": {
"analysis": {
"char_filter": {
"my_mapping": {
"type": "mapping",
"mappings": [
"-=>"
]
}
},
"analyzer": {
"autocomplete_search": {
"tokenizer": "keyword",
"char_filter": [
"my_mapping"
],
"filter": [
"trim"
]
},
"autocomplete_index": {
"tokenizer": "keyword",
"filter": [
"trim"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"ucn": {
"type": "multi_field",
"fields": {
"ucn_autoc": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"ucn": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
char_filter将无效替换-
:-=>
。我也会使用trim
过滤器去除任何尾随或前导空格。不知道你的autocomplete_index
分析仪是什么,我只使用了keyword
。
测试分析仪GET /my_index/_analyze?analyzer=autocomplete_search&text= 0123-34742-000
会导致:
"tokens": [
{
"token": "012334742000",
"start_offset": 0,
"end_offset": 17,
"type": "word",
"position": 1
}
]
这意味着它确实消除了-
和空格。
典型的查询是:
{
"query": {
"match": {
"ucn.ucn_autoc": " 0123-34742-000 "
}
}
}