Elasticsearch在inner_hits上聚合

时间:2016-02-24 06:17:44

标签: elasticsearch

我正在尝试对嵌套对象(查询)的inner_hits进行一些聚合,这些聚合根据查询日期进行过滤。我在下面的块中进行的聚合是聚合主文档和“查询”中的所有对象,而不仅仅是内部命中中的对象。

GET /networkcollection/branch_routers/_search/
{
  "_source": false,
  "query": {
    "filtered": {
      "query": {
        "match": {
          "mh": 123
        }
      },
      "filter": {
        "nested": {
          "path": "queries",
          "filter": {
            "range": {
              "queries.dateQuery": {
                "gt": "20160101T200000.000Z",
                "lte": "now"
              }
            }
          },
          "inner_hits": {}
        }
      }
    }
  },
  "aggs": {
    "queries": {
      "filter": {
        "nested": {
          "path": "queries",
          "filter": {
            "range": {
              "queries.dateQuery": {
                "gte": "20160101T200000.000Z",
                "lte": "now"
              }
            }
          }
        }
      },
      "aggs": {
        "minDateQuery": {
          "min": {
            "field": "queries.dateQuery"
          }
        }
      }
    }
  }
}

如何完成此聚合以使其仅聚合inner_hits中返回的“查询”对象?

1 个答案:

答案 0 :(得分:0)

我对这个答案很晚了,但是很可能仅在inner_hits上进行汇总。

我的ES版本:6.2.3

我正在提供详细的响应,包括索引映射,一些虚拟文档和search_query +响应。

基本思想是使用“过滤器”聚合。除非您要执行一些非常复杂的查询(以缩小聚合配置文件的范围),否则根本不需要使用search_request的“ query”部分。最简单的查询可以在聚合“过滤器”中轻松指定。

索引设置:

PUT networkcollection
{
  "mappings": { 
    "branch_routers" : {
      "properties" : {
        "mh" : {
          "type" : "text"
        },
        "queries" : {
          "type" : "nested",
          "properties" : {
            "dateQuery" : {
              "type" : "date"
            }
          }
        }
      }
    }
  }
}

PUT networkcollection/branch_routers/1
{
  "mh" : "corona",
  "queries" : [
    {
      "dateQuery" : "2012-04-23"
    },
    {
      "dateQuery" : "2013-04-23"
    },
    {
      "dateQuery" : "2014-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}

PUT networkcollection/branch_routers/2
{
  "mh" : "happy",
  "queries" : [
    {
      "dateQuery" : "2009-04-23"
    },
    {
      "dateQuery" : "2008-04-23"
    },
    {
      "dateQuery" : "2007-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}

PUT networkcollection/branch_routers/3
{
  "mh" : "happy",
  "queries" : [
    {
      "dateQuery" : "2001-04-23"
    },
    {
      "dateQuery" : "2008-04-23"
    },
    {
      "dateQuery" : "2007-04-23"
    },
    {
      "dateQuery" : "2015-04-23"
    },
    {
      "dateQuery" : "2016-04-23"
    },
    {
      "dateQuery" : "2017-04-23"
    },
    {
      "dateQuery" : "2018-04-23"
    },
    {
      "dateQuery" : "2019-04-23"
    },
    {
      "dateQuery" : "2020-04-23"
    }
  ]
}

我们添加了三个基本文档,现在我们尝试将“ mh”过滤为“ happy”,并且我们希望嵌套对象中的最小dateQuery能够在2016年和现在之间进行过滤(目前在日冕病毒锁定期间,所以您知道年份:))。

搜索查询:

GET networkcollection/branch_routers/_search
{
  "_source": false, 
  "query": {
    "match": {
      "mh": "happy"
    }
  },
  "aggs": {
    "filtered_agg": {
      "filter": {
        "match" : {
          "mh" : "happy"
        }
      },
      "aggs": {
        "filtered_nested": {
          "nested": {
            "path": "queries"
          },
          "aggs": {
            "dateQuery_agg": {
              "date_range": {
                "field": "queries.dateQuery",
                "ranges": [
                  {
                    "from": "now-4y/y",
                    "to": "now"
                  }
                ]
              },
              "aggs": {
                "min_date": {
                  "min": {
                    "field": "queries.dateQuery"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

响应:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "networkcollection",
        "_type": "branch_routers",
        "_id": "2",
        "_score": 0.2876821
      },
      {
        "_index": "networkcollection",
        "_type": "branch_routers",
        "_id": "3",
        "_score": 0.2876821
      }
    ]
  },
  "aggregations": {
    "filtered_agg": {
      "doc_count": 2,
      "filtered_nested": {
        "doc_count": 18,
        "dateQuery_agg": {
          "buckets": [
            {
              "key": "2016-01-01T00:00:00.000Z-2020-05-14T23:02:31.611Z",
              "from": 1451606400000,
              "from_as_string": "2016-01-01T00:00:00.000Z",
              "to": 1589497351611,
              "to_as_string": "2020-05-14T23:02:31.611Z",
              "doc_count": 10,
              "min_date": {
                "value": 1461369600000,
                "value_as_string": "2016-04-23T00:00:00.000Z"
              }
            }
          ]
        }
      }
    }
  }
}

如您所见,它可以正确过滤掉以“ mh” =“ corona”列出的文档,并仅保留两个带有“ mh” =“ happy”的文档,然后仅过滤那些位于其中的“查询”对象在我指定的日期范围内,最后提供min_date。