我无法让Elasticsearch在诸如15 pound chocolate cake
之类的短语上生成正确的令牌。在对该字段执行和fielddata_field
查询时,它会产生以下结果:
pou
poun
pound
cho
choc
choco
chocol
chocola
chocolat
chocolate
cak
cake
我根本看不到那里的数字。我尝试了几种不同的分析仪选项组合无济于事。以下是我的映射:
{
"settings" : {
"index" : {
"analysis": {
"filter": {
"nGram_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20
},
"my_word": {
"type":"word_delimiter",
"preserve_original": "true"
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"asciifolding",
"my_word",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}}
},
"mappings": {
"categories": {
"properties": {
"id": {"type": "text"},
"sort": {"type": "long"},
"search_term":{"type":"text","analyzer": "nGram_analyzer","search_analyzer": "whitespace_analyzer", "fielddata":true}
}
}
}
}
我尝试了nGram
过滤器,如:
"nGram_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
在"generate_number_parts": "true"
上设置"generate_word_parts": true
word_delimiter
也无济于事。
修改
我通过将min_gram
大小更改为2来实现它,但我希望将其保持为3.我想知道是否有一种方法可以保持克大小为3而且还保持数字不变?
答案 0 :(得分:0)
行为符合预期。这不是数字标记的问题,而是术语长度。即使你有一个包含1或2个字符的字符串,它也会被过滤掉。
min_gram:克中字符的最小长度。默认为1
任何字符数少于min的字符都将被过滤掉
因此,在这种情况下,15会被过滤掉。