elasticsearch - 搜索次数会影响到什么?

时间:2016-06-05 10:30:04

标签: elasticsearch

我有以下映射:

POST music
{
  "settings": {
    "analysis": {
      "filter": {
        "nGram_filter": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit",
            "punctuation",
            "symbol"
          ]
        }            
      },
      "analyzer": {
        "nGram_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "asciifolding",
            "nGram_filter"
          ]
        },
        "whitespace_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "song": {
      "properties": {
        "song_field": {
          "type": "string",
          "analyzer": "nGram_analyzer",
          "search_analyzer": "whitespace_analyzer"
        }
      }
    }
  }
}

我插入了两个文档:

POST music/song
{
  "song_field" : "Premeditiated murder"
}

POST music/song
{
  "song_field" : "Premeditiated"
}

以下是查询:

POST music/song/_search
{
  "size": 10,
  "query": {
    "match": {
      "song_field": {
        "query": "Premeditiated murd",
        "fuzziness": 2
      }
    }
  }
}

响应:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.78730416,
    "hits": [
      {
        "_index": "music",
        "_type": "song",
        "_id": "AVUf6XK1ancUpEdFLdz8",
        "_score": 0.78730416,
        "_source": {
          "song_field": "Premeditiated"         
        }
      },
      {
        "_index": "music",
        "_type": "song",
        "_id": "AVUfUbocancUpEdFLdUf",
        "_score": 0.668494,
        "_source": {
          "song_field": "Premeditiated murder"
        }
      }
    ]
  }
}

我有两个问题:

  1. 为什么Premeditiated分数更高?如何才能获得合理的校正+自动完成?

  2. 一遍又一遍地搜索同一文档会影响默认的es分数吗?

1 个答案:

答案 0 :(得分:0)

您得到错误的响应,因为当您有多个共享时,按相关性排序会因非常小的数据集而被中断。计算每个共享的相关性,然后合并每个共享的结果并返回,以便您的" Premeditiated"在一个共享中有更大的相关性。这是一个常见问题,在此处有详细描述:https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

有两种方法可以解决您可以使用的问题:
1.在定义索引映射期间,number_of_shards选项= 1 2.将以下信息添加到您的搜索查询中:search_type = dfs_query_then_fetch

使用上述选项之一后,您将获得所需的结果。

关于你的第二个问题:每次搜索得分时都会计算出来。即使您反复搜索同一文档,也会计算得分并且_score结果始终相同。如果您想了解更多评分的工作方式,那么您需要阅读"控制相关性"第3章}}。您始终可以在查询中添加explain属性,以了解如何计算scroing https://www.elastic.co/guide/en/elasticsearch/guide/current/controlling-relevance.html

P.S
很好,你提供了你的JSON,但在查询中有一个错误的字段,它应该是song_field而不是song_field_1。另外,您的回复不适合存储在类型中的数据(请参阅respown中的_source字段),但这并不重要:P。