我正在尝试在Elasticsearch中设置搜索分析,并且尝试了很多组合,但均未成功,现在我不知道是否有可能:
假设我有3个具有以下全名的用户
键入:
Jo
应该给 John Doe 和 Johnatan Lebus Ja
应该给 Jane Doe doe
应该给 Jane Doe 和 John Doe doe john
应该只给出 John Doe ,而不是 Jane Doe 最后一种情况可能是什么,应该是什么配置?
实际上我有这个:
"analysis": {
"analyzer": {
"keyword_analyzer": {
"char_filter\"": [],
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"token_chars": [
"letter"
],
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "5"
}
}
},
谢谢
答案 0 :(得分:1)
我绝对认为您的分析器可能适合您的用例,我怀疑您在查询时需要帮助。
我使用您的分析仪设置了索引,并使用它创建了一个字段:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"keyword_analyzer": {
"char_filter\"": [],
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"token_chars": [
"letter"
],
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "5"
}
}
}
},
"mappings": {
"test_doc": {
"properties": {
"full_name": {
"type": "text",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
然后我为一些文档建立索引:
PUT test/test_doc/1
{
"full_name": "John Doe"
}
PUT test/test_doc/2
{
"full_name": "Jane Doe"
}
PUT test/test_doc/3
{
"full_name": "Johnatan Lebus"
}
然后,我将以下查询用作您的最后一种情况。
GET test/_search
{
"query": {
"match": {
"full_name": {
"operator": "and",
"query": "doe john"
}
}
}
}
使用上面的任何文本替换“查询”字段,即可获得所需的结果。解决您问题的真正“解决方案”是在查询时更具创造力,尽管从令牌的角度看似乎不可能。
希望这会有所帮助!