我有一个字段ManufacturerName
"ManufacturerName": {
"type": "keyword",
"normalizer" : "keyword_lowercase"
},
还有一个规范化器
"normalizer": {
"keyword_lowercase": {
"type": "custom",
"filter": ["lowercase"]
}
}
在搜索“ ripcurl”时会匹配。但是,当搜索“撕裂卷曲”时不会。
如何/用什么方式连接某些单词。即'rip curl'->'ripcurl'
很抱歉,如果重复的话,我已经花了一些时间寻求解决方案。
答案 0 :(得分:1)
您想利用text
字段来查找所需内容,并通过Ngram Tokenizer来实现这种要求
以下是示例映射,查询和响应:
PUT mysomeindex
{
"mappings": {
"mydocs":{
"properties": {
"ManufacturerName":{
"type": "text",
"analyzer": "my_analyzer",
"fields":{
"keyword":{
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
},
"settings": {
"analysis": {
"normalizer": {
"my_normalizer":{
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer",
"filter": [ "synonyms" ]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 5,
"token_chars": [
"letter",
"digit"
]
}
},
"filter": {
"synonyms":{
"type": "synonym",
"synonyms" : ["henry loyd, henry loid, henry lloyd => henri lloyd"]
}
}
}
}
}
请注意,字段ManufacturerName
是multi-field,它既具有text
类型又具有其同级keyword
类型。这样,对于完全匹配和聚合查询,您可以使用keyword
字段,而对于此要求,您可以使用text
字段。
POST mysomeindex/mydocs/1
{
"ManufacturerName": "ripcurl"
}
POST mysomeindex/mydocs/2
{
"ManufacturerName": "henri lloyd"
}
当您摄取上述文档时,elasticsearch的作用是,它会创建长度为3
至5
的令牌,并将其存储在反向索引中,例如`rip,ipc,pcu等...
您可以执行以下查询以查看创建了哪些令牌:
POST mysomeindex/_analyze
{
"text": "ripcurl",
"analyzer": "my_analyzer"
}
我也建议您研究Edge Ngram令牌生成器,看看它是否更适合您的要求。
POST mysomeindex/_search
{
"query": {
"match": {
"ManufacturerName": "rip curl"
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.25316024,
"hits": [
{
"_index": "mysomeindex",
"_type": "mydocs",
"_id": "1",
"_score": 0.25316024,
"_source": {
"ManufacturerName": "ripcurl"
}
}
]
}
}
POST mysomeindex/_search
{
"query": {
"match": {
"ManufacturerName": "henri lloyd"
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.2784421,
"hits": [
{
"_index": "mysomeindex",
"_type": "mydocs",
"_id": "2",
"_score": 2.2784421,
"_source": {
"ManufacturerName": "henry lloyd"
}
}
]
}
}
注意::如果您打算使用同义词,则最好的方法是将它们包含在文本文件中,并相对于config
文件夹位置添加该同义词,如{{ 3}}
希望这会有所帮助!