在Elasticsearch中使用ngram时如何控制结果的计分或排序?

时间:2018-08-14 11:01:37

标签: elasticsearch

我正在使用 Elasticsearch 6.X。

我创建了一个索引类型为test_index的索引doc,如下所示:

PUT /test_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "7",
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "ngram": {
              "type": "text",
              "fielddata": true,
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

我已索引数据如下:

PUT /text_index/doc/1
{
    "my_text": "ohio"
}
PUT /text_index/doc/2
{
    "my_text": "ohlin"
}
PUT /text_index/doc/3
{
    "my_text": "john"
}

然后我使用了搜索查询:

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "oh",
            "fields": [
              "my_text^5",
              "my_text.ngram"
            ]
          }
        }
      ]
    }
  }
}

得到答复:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0042334,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1.0042334,
        "_source": {
          "my_text": "ohio"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_score": 0.97201055,
        "_source": {
          "my_text": "john"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.80404717,
        "_source": {
          "my_text": "ohlin"
        }
      }
    ]
  }
}

在这里,我们可以看到当我搜索oh时,我按以下顺序得到了结果:

-> ohio
-> john
-> ohlin

但是,我希望对结果进行计分和排序,以便为匹配前缀赋予更高的优先级:

-> ohio
-> ohlin
-> john

如何获得这种结果?我在这里可以采取什么方法? 预先感谢。

1 个答案:

答案 0 :(得分:0)

您应该使用edge_ngram标记生成器使用新的分析器添加新的子字段,然后在多重匹配中添加新的子字段。

然后,您需要为多匹配查询使用类型most_fields。然后,只有搜索词开头的文档会在此子字段上匹配,然后针对其他匹配的文档进行增强。