使一个完整的单词得分比Edge NGram子集更多

时间:2015-09-15 08:36:27

标签: elasticsearch lucene n-gram

我试图在匹配全名的文档上获得更高的分数,而不是具有相同值的Edge NGram子集。

结果如下:

Pos Name              _score       _id

1   Baritone horn     7.56878     1786
2   Baritone ukulele  7.56878     2313
3   Bari              7.56878     2360
4   Baritone voice    7.56878     1787

我打算让第三个(" Bari")获得更高的分数,因为它是全名,但是,因为边缘ngram分解将使所有其他人完全具有& #34;巴里"索引的词。所以你可以在结果表上看到,所有人的分数是相等的,我甚至不知道弹性搜索是如何命令的,因为_id不是顺序的,也不是命令的名字。

我怎样才能做到这一点?

由于

示例'代码'

设置

{
  "analysis": {
    "filter": {
      "edgeNGram_filter": {
        "type": "edgeNGram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
          "letter",
          "digit",
          "punctuation",
          "symbol"
        ]
      }
    },
    "analyzer": {
      "edgeNGram_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding",
          "edgeNGram_filter"
        ]
      },
      "whitespace_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

source

映射:

{
  "name": {
    "type": "string",
    "index": "not_analyzed"
  },
  "suggest": {
    "type": "completion",
    "index_analyzer": "nGram_analyzer",
    "search_analyzer": "whitespace_analyzer",
    "payloads": true
  }
}

查询:

POST /attribute-tree/attribute/_search
{
  "query": {
    "match": {
      "suggest": "Bari"
    }
  }
}

结果:

(仅保留相关数据)

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 7.56878,
    "hits": [
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1786",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone horn",
          "suggest": {
            "input": [
              "Baritone",
              "horn"
            ],
            "output": "Baritone horn"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2313",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone ukulele",
          "suggest": {
            "input": [
              "Baritone",
              "ukulele"
            ],
            "output": "Baritone ukulele"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2360",
        "_score": 7.56878,
        "_source": {
          "name": "Bari",
          "suggest": {
            "input": [
              "Bari"
            ],
            "output": "Bari"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1787",
        "_score": 7.568078,
        "_source": {
          "name": "Baritone voice",
          "suggest": {
            "input": [
              "Baritone",
              "voice"
            ],
            "output": "Baritone voice"
          }
        }
      }
    ]
  }
}

1 个答案:

答案 0 :(得分:3)

您可以使用bool查询运算符及其should子句将得分添加到完全匹配,如下所示:

POST /attribute-tree/attribute/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "suggest": "Bari"
          }
        }
      ],
      "should": [
        {
          "match": {
            "name": "Bari"
          }
        }
      ]
    }
  }
}

ElasticSearch definitive guide中,should子句中的查询被称为 signal 子句,这就是你如何区分完美匹配和ngram的匹配。您将拥有与must子句匹配的所有文档,但由于should查询评分公式,匹配bool个查询的文档将获得更多分数:

score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)

结果是你所期望的,巴里是第一个结果(在得分方面遥遥领先:)):

"hits": {
      "total": 3,
      "max_score": 0.4339554,
      "hits": [
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2360",
            "_score": 0.4339554,
            "_source": {
               "name": "Bari",
               "suggest": {
                  "input": [
                     "Bari"
                  ],
                  "output": "Bari"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "1786",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone horn",
               "suggest": {
                  "input": [
                     "Baritone",
                     "horn"
                  ],
                  "output": "Baritone horn"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2313",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone ukulele",
               "suggest": {
                  "input": [
                     "Baritone",
                     "ukulele"
                  ],
                  "output": "Baritone ukulele"
               }
            }
         }
      ]