我试图在ElasticSearch网站上复制搜索功能的确切行为。
有谁知道我在哪里可以找到映射/设置的来源?以及如何执行查询?
主要要求:
方案
想象一下,我有以下数据集:
ID, NAME
1, SoftwareRocks everytime
10, The is nothing like home
8, Opacc Software AG is good but software is even better
2, Opacc Software AG
3, Sage KHK Software AG
4, Software AG
5, bbv Software Services AG
6, Software AG2
7, Sof on the world
测试1
输入:sof
输出:
测试2
输入:软
输出:
测试3
输入:软件
输出:
测试4
输入:软件ag
输出:
尝试1
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": "software"
}
}
}
尝试2
{"query" : {"match_phrase_prefix": { "name": "Software ag" }}}
这会正确返回,但突出显示似乎已关闭。例如:
我期望的是搜索词可以突出显示。返回元素的顺序应该基于整个术语的长度。
答案 0 :(得分:0)
尝试此查询。这将高亮搜索搜索结果。
{
'query':{
'filtered':{
'query':{
'match':{
'_all':{
'query':"soft",
'type':'phrase'
}
}
}
}
},
'highlight':{
'pre_tags':'<em>',
'post_tags':'</em>',
'fields':{'*':{}}
}
}
答案 1 :(得分:0)
为了让你保持简单,我会使用ngram tokenizer获取映射中的字符串并进行简单的过滤搜索 映射
{
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "asciifolding", "filter_ngram"]
}
},
"filter": {
"filter_stop": {
"type": "stop",
"stopwords": "_english_",
"ignore_case": true
},
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 2,
"min_shingle_size": 2,
"output_unigrams": true
},
"filter_snowball": {
"type": "snowball",
"language": "english"
},
"filter_stemmer": {
"type": "porter_stem",
"language": "English"
},
"filter_ngram": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"filter_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
},
"filter_worddelimiter": {
"type": "word_delimiter"
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 3,
"max_gram": 15
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15,
"side": "front"
}
}
}
}
这些映射还包括我在自动完成解决方案中使用的一些高级过滤器。
{
"query": {
"filtered": {
"filter": {
"term": {
"FIELD": "VALUE"
}
}
}
}
}
有关弹性的示例不会对输入的关键字进行spell-check/fuzzy查询。
如果你想添加模糊,那么你也可以看一下上面的文档,根据你的用例建立模糊查询并调整模糊度。