我想用elasticsearch实现自动完成,我无法做到。 我想要这样的问题here。我尝试了建议的答案但是徒劳无功。 我希望得到以下内容:
我的索引字符串用于例如:
对于输入“develop”,我想作为输出:
对于输入“developpeur”,我想作为输出:
输入“suis”,我想作为输出:
我尝试使用完成建议器来实现此目的:
这是我正在使用的弹性搜索:
"number": "6.2.2",
"build_hash": "10b1edd",
"build_date": "2018-02-16T19:01:30.685723Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
映射:
{
"settings": {
"number_of_shards": "1",
"analysis": {
"filter": {
"prefix_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"ngram_filter": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3"
},
"synonym_filter": {
"type": "synonym",
"synonyms": [
"hackwillbereplacedatindexcreation,hackwillbereplacedatindexcreation"
]
},
"french_stop": {
"type": "stop",
"stopwords": "french"
}
},
"analyzer": {
"word": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"french_stop"
],
"char_filter": []
},
"prefix": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"prefix_filter"
],
"char_filter": []
},
"ngram_with_synonyms": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"ngram_filter"
],
"char_filter": []
},
"ngram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"ngram_filter"
],
"char_filter": []
}
}
}
},
"mappings": {
"training": {
"properties": {
"id": {
"type": "text",
"index": false
},
"label": {
"type": "text",
"index_options": "docs",
"copy_to": "full_label",
"analyzer": "word",
"fields": {
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "prefix",
"search_analyzer": "word"
},
"ngram": {
"type": "text",
"index_options": "docs",
"analyzer": "ngram_with_synonyms",
"search_analyzer": "ngram"
}
}
},
"labelSuggest": {
"type": "completion",
"analyzer": "word"
},
}
}
}
然后当我用我的数据创建索引时,我这样做(这是对ES api进行put调用的主体,我正在使用pyhon):
body = {
"label": r["title"],
"labelSuggest": {
"input": r["title"].ngrams()
},
"weight": 1.
}
r [“title”]。ngrams()获取标题的所有ngrams。例如: “发展研究生物技术”将给予:“发展”,“研究”,“生物技术”,“发展研究”,“研究生物技术”和“发展研究生物技术”
然后打电话给suggseter,我这样做:
POST http://localhost:9200/training/_search?pretty
{
"suggest": {
"labelSuggest": {
"text": "developpeur",
"completion": {
"field": "labelSuggest",
"skip_duplicates": true
}
}
}
}
结果是:
{
"text": "développement",
"_index": "activity_20180518092449",
"_type": "activity",
"_id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"_score": 1,
"_source": {
"id": "2031ce8b-6589-3270-afdf-7901aa21efa1",
"name": "development research biotech",
"labelSuggest": [
"development",
"research",
"biotech",
"development research",
"research biotech",
"development research biotech"
]
}
但我想要的东西能给我:“发展”,“发展研究”和“发展研究生物技术”(假设我们只将该文件作为输入)
我正在做的映射/查询有什么问题? 这是正确的方法吗? 我希望我的问题很明确。我徒劳地搜索了很多。
提前致谢
答案 0 :(得分:0)
首先,Ngram不会做你说的话。
这个:
"ngram_filter": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3"
},
将从#34; developpeur Java" - > dev,eve,vel,elo ......等等。
在此处查看文档:{{3}}
第二......对于你想要的结果我只会使用一个带有过滤器的自定义分析器" icu_folding"和" engram"和一个空格标记器。 现在,我将从2开始,最多20-25。
这将从" developpeur Java"生成这样的令牌列表。 - > de,dev,deve,devel,develo,developp,developpe,devellopeu,developper ......等等。
然后,您在该字段上进行简单的术语搜索。如果它是该自动填充的下拉列表,您将在键入时返回记录。 希望我理解你的问题,我希望这会有所帮助。
更新: 试试这个:
"suggester": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["my_ngram_filter", "icu_folding"],
"char_filter": []
}
"my_ngram_filter" is: "my_ngram_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "20"
}
然后在该字段上的映射应该看起来像
"labelSuggest": {
"type": "text",
"analyzer": "suggester"
}
然后进行简单的搜索
{
"query": {
"term": {
"labelSuggest": "dev"
}
}
}