我正在尝试使用以下分析器在 elastic serach 7.1 中实现部分子字符串搜索
PUT my_index-001
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"autocomplete"
]
},
"autocomplete_search": {
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
},
"filter": {
"autocomplete": {
"type": "nGram",
"min_gram": 2,
"max_gram": 40
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
之后,我尝试将一些示例数据添加到 my_index-001 并键入 doc
PUT my_index-001/doc/1
{
"title": "ABBOT Series LTD 2014"
}
PUT my_index-001/doc/2
{
"title": "ABBOT PLO LTD 2014A"
}
PUT my_index-001/doc/3
{
"title": "ABBOT TXT"
}
PUT my_index-001/doc/4
{
"title": "ABBOT DMO LTD. 2016-II"
}
用于执行部分搜索的查询:
GET my_index-001/_search
{
"query": {
"match": {
"title": {
"query": "ABB",
"operator": "or"
}
}
}
}
我期待分析器的以下输出
如果我输入 ABB 我应该得到 docid 1,2,3,4
如果我输入 ABB 2014 我应该得到 docid 1,2
如果我输入 ABBO PLO 我应该得到文档 2
如果我输入 TXT,我应该得到 doc 3
使用上述分析器设置,我没有得到预期的结果。 如果我在弹性搜索的分析器设置中遗漏了任何内容,请告诉我
答案 0 :(得分:0)
您几乎到了那里,但有几个问题。
1
by default。为了使用您的高 ngram 间隔,您需要明确增加索引级别设置 max_ngram_diff
:PUT my_index-001
{
"settings": {
"index": {
"max_ngram_diff": 40 <--
},
...
}
}
v7
中已弃用。支持 nGram
(小写 ngram
)的 g
标记过滤器也是如此。 string
字段类型也是如此!这是更正后的 PUT 请求正文:PUT my_index-001 <--- no whitespace after the URI!
{
"settings": {
"index": {
"max_ngram_diff": 40 <--- explicit setting
},
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"autocomplete"
]
},
"autocomplete_search": {
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
},
"filter": {
"autocomplete": {
"type": "ngram", <--- ngram, not nGram
"min_gram": 2,
"max_gram": 40
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text", <--- text, not string
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
_doc
类型,因此您需要调整插入文档的方式。幸运的是,唯一的区别是将 URI 中的 doc
更改为 _doc
:PUT my_index-001/_doc/1
{ "title": "ABBOT Series LTD 2014" }
PUT my_index-001/_doc/2
{ "title": "ABBOT PLO LTD 2014A" }
PUT my_index-001/_doc/3
{ "title": "ABBOT TXT" }
PUT my_index-001/_doc/4
{ "title": "ABBOT DMO LTD. 2016-II" }
operator
更改为 and
,即:GET my_index-001/_search
{
"query": {
"match": {
"title": {
"query": "ABB 2014",
"operator": "and"
}
}
}
}
除此之外,所有四个测试场景都应该返回您期望的结果。