我有以下映射:
POST music
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"song": {
"properties": {
"song_field": {
"type": "string",
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
}
}
}
我插入了两个文档:
POST music/song
{
"song_field" : "Premeditiated murder"
}
POST music/song
{
"song_field" : "Premeditiated"
}
以下是查询:
POST music/song/_search
{
"size": 10,
"query": {
"match": {
"song_field": {
"query": "Premeditiated murd",
"fuzziness": 2
}
}
}
}
响应:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.78730416,
"hits": [
{
"_index": "music",
"_type": "song",
"_id": "AVUf6XK1ancUpEdFLdz8",
"_score": 0.78730416,
"_source": {
"song_field": "Premeditiated"
}
},
{
"_index": "music",
"_type": "song",
"_id": "AVUfUbocancUpEdFLdUf",
"_score": 0.668494,
"_source": {
"song_field": "Premeditiated murder"
}
}
]
}
}
我有两个问题:
为什么Premeditiated
分数更高?如何才能获得合理的校正+自动完成?
一遍又一遍地搜索同一文档会影响默认的es分数吗?
答案 0 :(得分:0)
您得到错误的响应,因为当您有多个共享时,按相关性排序会因非常小的数据集而被中断。计算每个共享的相关性,然后合并每个共享的结果并返回,以便您的" Premeditiated"在一个共享中有更大的相关性。这是一个常见问题,在此处有详细描述:https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html
有两种方法可以解决您可以使用的问题:
1.在定义索引映射期间,number_of_shards选项= 1
2.将以下信息添加到您的搜索查询中:search_type = dfs_query_then_fetch
使用上述选项之一后,您将获得所需的结果。
关于你的第二个问题:每次搜索得分时都会计算出来。即使您反复搜索同一文档,也会计算得分并且_score结果始终相同。如果您想了解更多评分的工作方式,那么您需要阅读"控制相关性"第3章}}。您始终可以在查询中添加explain属性,以了解如何计算scroing https://www.elastic.co/guide/en/elasticsearch/guide/current/controlling-relevance.html。
P.S
很好,你提供了你的JSON,但在查询中有一个错误的字段,它应该是song_field而不是song_field_1。另外,您的回复不适合存储在类型中的数据(请参阅respown中的_source字段),但这并不重要:P。