如何使minimum_should_match与嵌套映射一起使用?

时间:2018-03-12 20:58:10

标签: elasticsearch

我有关于ElasticSearch的问题以及更多类似此查询的问题。

有映射:

{
  "directory.v1": {
    "mappings": {
      "profile.event": {
        "properties": {
          "event": {
            "properties": {
              "naics": {
                "type": "nested",
                "properties": {
                  "type": {
                    "type": "keyword"
                  },
                  "value": {
                    "type": "keyword"
                  }
                }
              }
            }
          },
          "user_id": {
            "type": "long"
          }
        }
      }
    }
  }
}

和文档(A)作为源和文档(B),更像这个查询(对于A)

个人资料A(用作来源):

{
  "_index": "directory.v1",
  "_type": "profile.event",
  "_id": "83731111.559",
  "_score": 1,
  "_source": {
    "user_id": 8373,
    "event": {
      "naics": [
        {
          "value": 331,
          "type": "naics"
        },
        {
          "value": 74,
          "type": "naics"
        },
        {
          "value": 938,
          "type": "naics"
        },
        {
          "value": 2048,
          "type": "naics"
        },
        {
          "value": 939,
          "type": "naics"
        },
        {
          "value": 2049,
          "type": "naics"
        },
        {
          "value": 940,
          "type": "naics"
        },
        {
          "value": 2050,
          "type": "naics"
        },
        {
          "value": 941,
          "type": "naics"
        },
        {
          "value": 2051,
          "type": "naics"
        },
        {
          "value": 942,
          "type": "naics"
        },
        {
          "value": 2052,
          "type": "naics"
        },
        {
          "value": 943,
          "type": "naics"
        },
        {
          "value": 2053,
          "type": "naics"
        },
        {
          "value": 944,
          "type": "naics"
        },
        {
          "value": 2054,
          "type": "naics"
        },
        {
          "value": 945,
          "type": "naics"
        },
        {
          "value": 2055,
          "type": "naics"
        },
        {
          "value": 473,
          "type": "naics"
        },
        {
          "value": 128,
          "type": "naics"
        },
        {
          "value": 10,
          "type": "naics"
        },
        {
          "value": 1242,
          "type": "naics"
        },
        {
          "value": 472,
          "type": "naics"
        },
        {
          "value": 1241,
          "type": "naics"
        }
      ]
    }
  }
}

简介B:

{
  "_index": "directory.v1",
  "_type": "profile.event",
  "_id": "46124111.559",
  "_score": 1,
  "_source": {
    "user_id": 46124,
    "event": {
      "naics": [
        {
          "value": 331,
          "type": "naics"
        },
        {
          "value": 74,
          "type": "naics"
        },
        {
          "value": 938,
          "type": "naics"
        },
        {
          "value": 2048,
          "type": "naics"
        },
        {
          "value": 939,
          "type": "naics"
        },
        {
          "value": 2049,
          "type": "naics"
        },
        {
          "value": 940,
          "type": "naics"
        },
        {
          "value": 2050,
          "type": "naics"
        },
        {
          "value": 941,
          "type": "naics"
        },
        {
          "value": 2051,
          "type": "naics"
        },
        {
          "value": 942,
          "type": "naics"
        },
        {
          "value": 2052,
          "type": "naics"
        },
        {
          "value": 943,
          "type": "naics"
        },
        {
          "value": 2053,
          "type": "naics"
        },
        {
          "value": 944,
          "type": "naics"
        },
        {
          "value": 2054,
          "type": "naics"
        },
        {
          "value": 945,
          "type": "naics"
        },
        {
          "value": 2055,
          "type": "naics"
        }
      ]
    }
  }
}

其中B doc包含A文档中包含的所有元素(naics)。

所以我真的不明白为什么查询:

   {
      "query": {
        "nested": {
          "path": "event.naics",
          "query": {
            "more_like_this": {
              "like": [
                {
                  "_id": "83731111.559",
                  "_type": "profile.event"
                }
              ],
              "fields": [
                "event.naics.value"
              ],
              "min_term_freq": 1,
              "min_doc_freq": 1,
              "minimum_should_match": "8%"
            }
          }
        }
      }
    }

我有结果!!

但是当我增加min_should_match> = 9%时,它根本不匹配,我得不到任何结果。

还试图做这样的事情,这让我得到了一些高达11%的结果

{
  "query": {
    "nested": {
      "path": "event.naics",
      "query": {
        "more_like_this": {
          "like": [
            {
              "_id": "83731111.559",
              "_type": "profile.event"
            }
          ],
          "fields": [
            "event.naics.*"
          ],
          "min_term_freq": 1,
          "min_doc_freq": 1,
          "minimum_should_match": "11%"
        }
      }
    }
  }
}

源文档的termvecor是:

{
    "_index": "directory.v1",
    "_type": "profile.event",
    "_id": "83731111.559",
    "_version": 5,
    "found": true,
    "took": 0,
    "term_vectors": {}
}

1 个答案:

答案 0 :(得分:1)

如果你得到文件" A"对于field event.naics.value,您将看到总共有24个术语,每个术语的术语频率为1。 因此,当你进行8%匹配时,将向下舍入到24个生成的should子句中的1个子句,这样你就得到一个匹配。但是24个中有9%将循环到2个子句应该匹配,因为每个嵌套文档只有一个值,所以没有bueno。

有关计算详情,请参阅本页底部 https://github.com/elastic/elasticsearch/blob/99f88f15c5febbca2d13b5b5fda27b844153bf1a/server/src/main/java/org/elasticsearch/common/lucene/search/Queries.java

而且更多的来源就在这里 https://github.com/elastic/elasticsearch/blob/46a79127edfb0cc93b7580624010ff81ca0cb2f4/server/src/main/java/org/elasticsearch/common/lucene/search/MoreLikeThisQuery.java

术语向量

POST /directory.v1/profile.event/83731111.559/_termvectors
{
  "fields":["event.naics.value"],
  "offsets" : false,
  "payloads" : false,
  "positions" : false,
  "term_statistics" : true,
  "field_statistics" : true
}