我想使用弹性搜索执行子串/部分单词匹配。 我希望以特定的顺序返回结果。 为了解释我的问题,我将向您展示如何创建索引,映射以及我使用的记录。
创建索引和映射:
PUT /my_index1
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"my_type1": {
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
批量记录插入:
POST /my_index1/my_type1/_bulk
{ "index": { "_id": 1 }}
{ "text": "men's shaver" }
{ "index": { "_id": 2 }}
{ "text": "men's foil shaver" }
{ "index": { "_id": 3 }}
{ "text": "men's foil advanced shaver" }
{ "index": { "_id": 4 }}
{ "text": "norelco men's foil advanced shaver" }
{ "index": { "_id": 5 }}
{ "text": "men's shavers" }
{ "index": { "_id": 6 }}
{ "text": "women's shaver" }
{ "index": { "_id": 7 }}
{ "text": "women's foil shaver" }
{ "index": { "_id": 8 }}
{ "text": "women's foil advanced shaver" }
{ "index": { "_id": 9 }}
{ "text": "norelco women's foil advanced shaver" }
{ "index": { "_id": 10 }}
{ "text": "women's shavers" }
现在,我想要搜索" en&#s; s"。我正在使用以下查询进行搜索:
POST /my_index1/my_type1/_search
{
"query": {
"match": {
"text":
{ "query": "en's shaver",
"minimum_should_match": "100%"
}
}
}
}
我希望结果符合以下顺序:
我正在执行以下查询。它没有按照我想要的顺序给我结果:
POST /my_index1/my_type1/_search
{
"query": {
"query_string": {
"default_field": "text",
"query": "men's shaver",
"minimum_should_match": "90%"
}
}
}
请建议,如何达到上述效果?任何建议都会有所帮助。
*************************** 更新(2014年6月6日) ********************************
我做了一些改动:
1.喜欢使用多场
2.仅使用一个碎片
3.使用分析,过滤器和分离器
请参阅下面的设置:
索引:
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'
对于映射:
curl -XPUT "http://localhost:9200/my_improved_index/my_improved_index_type/_mapping" -d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}
}'
可用文件:
查询:
curl -XPOST "http://localhost:9200/my_improved_index/my_improved_index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men\"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men\"s shaver",
"slop": 5
}
}
}
]
}
}
}'
返回结果:
预期结果:
为什么更高距离的文件得分更高? 如何实现这个结果? stemmer或nGram设置有问题吗?