我想搜索产品代码 - 字符和数字的混合(例如:A210/444
,Alexx 1982 X
,...)。 (顺便说一句:有没有人可以搜索这类数据?)
我的索引包含index_analyzer
和search_analyzer
:
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"asciifolding",
"custom_word_delimiter",
"custom_edgengram"
]
},
"search_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"asciifolding",
"custom_word_delimiter",
"lowercase"
]
}
},
"filter": {
"custom_word_delimiter": {
"type": "word_delimiter",
"preserve_original": "true"
},
"custom_edgengram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "30"
}
}
}
}
}
问题在于自动输入。
index_analyzer
没问题,所有值都是word类型。
curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=index_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+------+------+
| 1 | al | 0–5 | word |
| 1 | ale | 0–5 | word |
| 1 | alex | 0–5 | word |
| 1 | alexx | 0–5 | word |
| 2 | 19 | 6–10 | word |
| 2 | 198 | 6–10 | word |
| 2 | 1982 | 6–10 | word |
+---+------------+------+------+
但是,search_analyzer
(没有edgeNGram)......
curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=search_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+-------+------------+
| 1 | alexx | 0–5 | <ALPHANUM> |
| 2 | 1982 | 6–10 | <NUM> |
| 3 | x | 11–12 | <ALPHANUM> |
+---+------------+-------+------------+
...将1982
识别为数字,这会导致搜索出现问题(使用_all
占位符)。在我尝试仅搜索1982
时,搜索结果不会受到影响。
有没有办法强制只使用某种字符串类型?
感谢您的任何想法!
马丁