我试图让Elasticsearch忽略连字符。我不希望它将连字符的任何一边分成单独的单词。看起来很简单,但是我把头撞在墙上。
我想要字符串" Roland JD-Xi"产生以下条款: [roland jd-xi,roland,jd-xi,jdxi,roland jdxi]
我无法轻易实现这一目标。大多数人只会键入' jdxi'所以我最初的想法就是删除连字符。所以我使用以下定义
name: {
"type": "string",
"analyzer": "language",
"include_in_all": true,
"boost": 5,
"fields": {
"my_standard": {
"type": "string",
"analyzer": "my_standard"
},
"my_prefix": {
"type": "string",
"analyzer": "my_text_prefix",
"search_analyzer": "my_standard"
},
"my_suffix": {
"type": "string",
"analyzer": "my_text_suffix",
"search_analyzer": "my_standard"
}
}
}
相关的分析器和过滤器定义为
{
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"std": {
"tokenizer": "standard",
"char_filter": "html_strip",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "length", "strip_hyphens"]
...
"my_text_prefix": {
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_front"]
},
"my_text_suffix": {
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_back"]
},
"my_standard": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": "my_filter",
"filter": ["standard", "elision", "asciifolding", "lowercase"]
}
},
"char_filter": {
"my_filter": {
"type": "mapping",
"mappings": ["- => ", ". => "]
}
},
"filter": {
"edge_ngram_front": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20,
"side": "front"
},
"edge_ngram_back": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 20,
"side": "back"
},
"strip_spaces": {
"type": "pattern_replace",
"pattern": "\\s",
"replacement": ""
},
"strip_dots": {
"type": "pattern_replace",
"pattern": "\\.",
"replacement": ""
},
"strip_hyphens": {
"type": "pattern_replace",
"pattern": "-",
"replacement": ""
},
"stop": {
"type": "stop",
"stopwords": "_none_"
},
"length": {
"type": "length",
"min": 1
}
}
}
我已经能够测试(即_analyze)这个和字符串" Roland JD-Xi"被标记为 [roland,jdxi]
这不完全是我想要的,但足够接近它应该匹配' jdxi'。
但这就是我的问题。如果我做一个简单的" index / _search?q = jdxi"它没有带回文件。但是,如果我做了" index / _search?q = roland + jdxi"它确实带回了文件。
所以至少我知道连字符被删除了,但是如果令牌" roland"和" jdxi"正在创建怎样来" index / _search?q = jdxi"与文件不匹配?
答案 0 :(得分:3)
我在ES 6上复制了您的案例,并且搜索index/_search?q=jdxi
会返回该文档。
问题可能是在搜索index/_search?q=jdxi
而未指定字段时,它基本上会在_all
中搜索包含name
字段中的内容(与{{基本相同) 1}})。由于未使用index/_search?q=name:jdxi
分析器分析该字段,因此无法获得任何结果。
您应该做的是使用my_standard
子字段进行搜索,即my_standard
,并且非常确定您会获得该文档。