带有嵌套对象的Elasticsearch OR查询返回的internal_hits与条件不匹配

时间:2018-08-07 12:33:56

标签: elasticsearch

查询嵌套对象时得到奇怪的结果。想象一下以下结构:

{ owner.name = "fred",
  ...,
  pets [
    { name = "daisy", ... },
    { name = "flopsy", ... }
  ]
}

如果我只有上面显示的文档,并且搜索符合此条件的宠物:

pets.name = "daisy" OR
(owner.name = "julie" and pet.name = "flopsy")

我希望只得到一个结果(“雏菊”),但我同时得到两个宠物的名字。

这是复制此内容的一种方法:

# Create nested mapping
PUT pet-owners
{
  "mappings": {
    "animals": {
      "properties": {
        "owner": {"type": "text"},
        "pets": {
          "type": "nested",
          "properties": {
            "name": {"type": "text", "fielddata": true}
            }
          }
        }
     }
    }
}

# Insert nested object
PUT pet-owners/animals/1?op_type=create
{
    "owner" : "fred",
    "pets"  : [
        { "name" : "daisy"},
        { "name" : "flopsy"}
  ]
}

# Query
GET pet-owners/_search
{ "from": 0, "size": 50,
  "query": {
    "constant_score": {
      "filter": { "bool": {"must": [
        {"bool": {"should": [
            {"nested": {"query":
              {"term": {"pets.name": "daisy"}},
              "path":"pets",
              "inner_hits": {
                "name": "pets_hits_1",
                "size": 99,
                "_source": false,
                "docvalue_fields": ["pets.name"]
              }
            }},
            {"bool": {"must": [
              {"term": {"owner": "julie"}},
              {"nested": {"query":
                {"term": {"pets.name": "flopsy"}},
                "path":"pets",
                "inner_hits": {
                  "name": "pets_hits_2",
                  "size": 99,
                  "_source": false,
                  "docvalue_fields": ["pets.name"]
                }
              }}
            ]}}
          ]}}
  ]}}}},
  "_source": false
}

查询返回两个宠物的名字(与预期的相反)。

这种行为正常吗?我是在做错什么,还是关于嵌套结构或查询行为的推理有缺陷?

任何帮助或指导将不胜感激。

我正在ElasticSearch 6.3.x下运行此查询

编辑:我正在添加收到的回复,以更好地说明这种情况

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "pet-owners",
        "_type": "animals",
        "_id": "1",
        "_score": 1,
        "inner_hits": {
          "pets_hits_1": {
              "hits": {
                "total": 1,
                "max_score": 0.6931472,
                "hits": [
                  {
                    "_index": "pet-owners",
                    "_type": "animals",
                    "_id": "1",
                    "_nested": {
                      "field": "pets",
                      "offset": 0
                    },
                    "_score": 0.6931472,
                    "fields": {
                      "pets.name": [
                        "daisy"
                      ]
                    }
                  }
                ]
            }
          },
          "pets_hits_2": {
            "hits": {
              "total": 1,
              "max_score": 0.6931472,
              "hits": [
                {
                  "_index": "pet-owners",
                  "_type": "animals",
                  "_id": "1",
                  "_nested": {
                    "field": "pets",
                    "offset": 1
                  },
                  "_score": 0.6931472,
                  "fields": {
                    "pets.name": [
                      "flopsy"
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

因此,我们可以看到的不是查询匹配并返回整个现有文档,而是查询独立返回了每个pet,在每个inner_hits内部。这个结果令我惊讶。

1 个答案:

答案 0 :(得分:0)

(已编辑)-总之,此问题与“ inner_hits”有关:

inner_hits'pets_hits_2'似乎正在返回匹配项,因为它属于仅在pets字段中搜索'flopsy'的嵌套查询。

作为对我们单个文档的独立查询,这是有效的匹配结果。

但是,由于该查询位于bool /必须查询的列表内,因此其他查询在我们的文档中将不匹配,因此您很可能希望inner_hits对此有所帮助,因此不会返回匹配。

我无法找到任何文档来阐明这是否是故意行为-也许值得一提。