elasticsearch:在reverse_nested之后使用嵌套的agg显示比预期更高的计数

时间:2016-05-16 12:47:07

标签: elasticsearch

使用Elasticsearch 2.2.0,我这样做:

  1. 按嵌套字段分组: nested_pa​​th.nested_field
  2. 使用reverse_nested个,所以我可以应用此过滤器: non_nested_field ==“yay”
  3. 使用nested agg,然后我可以计算我正在分组的嵌套字段: nested_pa​​th.nested_field
  4. 问题:通过使用reverse_nested agg,我得到的doc_count比我预期的要高。

    以下是我正在编制索引的地图和文档:

    PUT /my_index
    {
       "mappings": {
          "my_type": {
             "properties": {
                "nested_path": {
                   "type": "nested",
                   "properties": {
                      "nested_field": {
                         "type": "string"
                      }
                   }
                },
                "non_nested_field": {
                   "type": "string"
                }
             }
          }
       }
    }
    
    POST /my_index/my_type/1
    {
      "non_nested_field": "whoray",
      "nested_path": [
        {
          "nested_field": "yes"
        },
        {
          "nested_field": "yes"
        },
        {
          "nested_field": "no"
        }
      ]
    }
    
    POST /my_index/my_type/2
    {
      "non_nested_field": "yay",
      "nested_path": [
        {
          "nested_field": "maybe"
        },
        {
          "nested_field": "no"
        }
      ]
    }
    

    请求正文:

    POST my_index/my_type/_search
    {
       "aggs": {
          "nested_option": {
             "nested": {
                "path": "nested_path"
             },
             "aggs": {
                "group_list": {
                   "terms": {
                      "field": "nested_path.nested_field",
                      "size": 100
                   },
                   "aggs": {
                      "level_1": {
                         "reverse_nested": {},
                         "aggs": {
                            "level_2": {
                               "filter": {
                                  "term": {
                                     "non_nested_field": "yay"
                                  }
                               },
                               "aggs": {
                                  "level_3": {
                                     "nested": {
                                        "path": "nested_path"
                                     },
                                     "aggs": {
                                        "stat": {
                                           "value_count": {
                                              "field": "nested_path.nested_field"
                                           }
                                        }
                                     }
                                  }
                               }
                            }
                         }
                      }
                   }
                }
             }
          }
       },
       "size": 0
    }
    

    我得到的部分答案是:

    {  
      "aggregations": {
        "nested_option": {
          "doc_count": 5,
          "group_list": {
            "buckets": [
              {
                "key": "no",
                "doc_count": 2,
                "level_1": {
                  "doc_count": 2,
                  "level_2": {
                    "doc_count": 1,
                    "level_3": {
                      "doc_count": 2,
                      "stat": {
                        "value": 2
                      }
                    }
                  }
                }
              }
              //.... 
            ]
          }
        }
      }
    }
    

    在响应中存储区数组的第一个元素中, level_1.level_2.doc_count 为1,这是正确的,因为这两个文档中只有一个索引 nested_pa​​th.nested_field ==“no” non_nested_field ==“yay”。但是响应中的 level_1.level_2.level_3.doc_count 是2.它应该只有1.这对我来说似乎是个错误。

0 个答案:

没有答案