Question

背景：我通过索引标记化名称（name字段）以及三元组分析名称（{{1}），对名称字段实施了部分搜索} field。

我已经提升了ngram字段，以确保令牌匹配在命中率的顶部冒泡。

问题：我正在尝试实现一个查询，将nGram匹配限制为仅匹配查询字符串的某个阈值（例如80％）的匹配。我理解name似乎是我正在寻找的，但我的问题是形成查询以实际产生这些结果。

我的确切令牌匹配被提升到顶部但我仍然得到在minimum_should_match字段中具有单个匹配的trigram的每个文档。

索引设置

ngram

索引映射

{
  "my_index": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "max_result_window": "30000",
        "creation_date": "1475853851937",
        "analysis": {
          "filter": {
            "ngram_filter": {
              "type": "ngram",
              "min_gram": "3",
              "max_gram": "3"
            }
          },
          "analyzer": {
            "ngram_analyzer": {
              "filter": [
                "lowercase",
                "ngram_filter"
              ],
              "type": "custom",
              "tokenizer": "standard"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "AuCjcP5sSb-m59bYrprFcw",
        "version": {
          "created": "2030599"
        }
      }
    }
  }
}

解决方案尝试

由于2个链接限制，

[ GIST：查询尝试]取消链接:( { "my_index": { "mappings": { "my_type": { "properties": { "acw": { "type": "integer" }, "pcg": { "type": "integer" }, "date": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }, "dob": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }, "id": { "type": "string" }, "name": { "type": "string", "boost": 10 }, "ngram": { "type": "string", "analyzer": "ngram_analyzer" }, "bdk": { "type": "integer" }, "mmw": { "type": "integer" }, "mpi": { "type": "integer" }, "sex": { "type": "string", "index": "not_analyzed" } } } } } }

我尝试了一个多匹配查询，它给了我正确的搜索结果，但我没有运气省略只匹配单个三元组的名称的结果（例如＆＃34; odo ＆＃34;内部的三元组＆＃34; odo philus ＆＃34;）

(https://gist.github.com/jordancardwell/2e690013666e7e1da6ef1acee314b4e6)

//this matches 'frodo' and sends results to the top, since `name` field is boosted
//  but also matches 'theodore' and 'rodolpho'

{
  "size":100,
  "from":0,
  "query":{
    "multi_match":{
      "query":"frodo",
      "fields":[
        "name",
        "ngram"
      ],
      "type":"best_fields"
    }
  }
}

我尝试过玩游戏，手动生成匹配查询，以便我只将//I then tried to throw in the `minimum_must_match` option // hoping it would filter out large strings that only had one matching trigram for instance { "size":100, "from":0, "query":{ "multi_match":{ "query":"frodo", "fields":[ "name", "ngram" ], "type":"best_fields", "minimum_should_match": "90%", } } }应用于minimum_must_match字段，但似乎无法得到正确的语法。

ngram

任何人都可以看到我做错了吗？

看起来这应该是相当简单的，但我必须错过一些明显的东西。

更新

我使用// I then tried to contruct a custom query to just return the `minimum_should_match`d results on the ngram field // I started with a query produced by using bodybuilder to `and` and `or` my other search criteria together { "query": { "bool": { "filter": { "bool": { "must": [ //each separate field's criteria `must`/`and`ed together { "query": { "bool": { "filter": { "bool": { "should": [ //each critereon for a specific field `should`/`or`ed together { //my attempt at getting `ngram` field results.. // should theoretically only return when field // contains nothing but matching ngrams // (i.e. exact matches and other fluke matches) "query": { "match": { "ngram": { "query": "frodo", "minimum_should_match": "100%" } } } } //... other critereon to be `should`/`or`ed together ] } } } } } //... other criteria to be `must`/`and`ed together ] } } } } }（使用感知用户界面）运行查询以尝试了解我的结果。

我在_explain=true match字段ngram上查询了"frod" minimum_should_match = 100%，但仍然可以获得至少匹配的每条记录NGRAM。（例如rodolpho，即使它不包含fro）

GIST: test query and results

注意：从[discuss.elastic.co] 交叉发布稍后会发布一个链接，但不能发布超过2个：/

(https://discuss.elastic.co/t/ngram-partial-match-limiting-ngram-results-in-multiple-field-query/62526)

Answer 1

我使用您的设置和映射来创建索引。你的查询似乎对我来说很好。我建议在其中一个＆＃34;意外＆＃34;上做一个explain。正在返回的文档，并查看为何与其匹配并返回其他结果。

这是我做的：

在您的分析器上运行analyze api，以查看查询将如何拆分为令牌。

curl -XGET 'localhost:9200/my_index/_analyze' -d '
{
  "analyzer" : "ngram_analyzer",
  "text" : "frodo"
}'

frodo将与您的分析仪分成3个令牌。

{
  "tokens": [
    {
      "token": "fro",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "rod",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    },
    {
      "token": "odo",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
}

我索引3个文件进行测试（仅使用ngrams字段）。以下是文档：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "ngram": "theodore"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "ngram": "frodo"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "ngram": "rudolpho"
        }
      }
    ]
  }
}

你提到的第一个问题，它与frodo和theodore匹配，但不像你提到的那样rudolpho - 这是有道理的，因为rudolpho不产生任何与frodo的三元组相匹配的三元组

frodo -> fro, rod, odo 

rudolpho -> rud, udo, dol, olp, lph, pho

使用你的第二个查询，我只回到frodo（其他两个都没有）。

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.53148466,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.53148466,
        "_source": {
          "ngram": "frodo"
        }
      }
    ]
  }
}

然后我在其他两个文档（theodore和rudolpho）上运行了解释（localhost:9200/my_index/my_type/2/_explain），我看到了这个（我已经剪切了回复）

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "2",
  "matched": false,
  "explanation": {
    "value": 0,
    "description": "Failure to meet condition(s) of required/prohibited clause(s)",
    "details": [
      {
        "value": 0,
        "description": "no match on required clause ((ngram:fro ngram:rod ngram:odo)~2)",
        "details": [

以上是预期的，因为来自佛罗多的三个令牌中至少有两个应该匹配。

nGram部分匹配＆amp;限制nGram会导致多个字段查询

更新

1 个答案: