当我通过以下聚合搜索时:
"aggregations": {
"codes": {
"terms": {
"field": "code"
},
"aggs": {
"dates": {
"date_range": {
"field": "created_time",
"ranges": [
{
"from": "2017-12-06T00:00:00.000",
"to": "2017-12-06T16:00:00.000"
},
{
"from": "2017-12-07T00:00:00.000",
"to": "2017-12-07T23:59:59.999"
}
]
}
}
}
}
}
我得到以下结果:
"aggregations": {
"codes": {
"buckets": [
{
"key": "123456",
"doc_count": 104005499,
"dates": {
"buckets": [
{
"key": "2017-12-05T20:00:00.000Z-2017-12-06T12:00:00.000Z",
"from_as_string": "2017-12-05T20:00:00.000Z",
"to_as_string": "2017-12-06T12:00:00.000Z",
"doc_count": 156643
},
{
"key": "2017-12-06T20:00:00.000Z-2017-12-07T19:59:59.999Z",
"from_as_string": "2017-12-06T20:00:00.000Z",
"to_as_string": "2017-12-07T19:59:59.999Z",
"doc_count": 11874
}
]
}
},
...
]
}
}
所以现在我有一个桶的列表。我需要为每个桶提供一个总计数值,这是内部桶的doc_counts
的总和。例如,我的第一个桶的总数应为156643 + 11874 = 168517。
我尝试过使用Sub Bucket聚合,但是
"totalcount": {
"sum_bucket": {
"buckets_path": "dates"
}
}
这不起作用,因为"buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.range.date.InternalDateRange.Bucket"
。任何想法我该怎么做?
答案 0 :(得分:0)
看起来这是一个已知问题。在弹性论坛上有一个discussion,在那里我找到了解决它的黑客(感谢Ruslan_Didyk,作者,顺便说一句):
POST my_aggs/my_doc/_search
{
"size": 0,
"aggregations": {
"codes": {
"terms": {
"field": "code"
},
"aggs": {
"dates": {
"date_range": {
"field": "created_time",
"ranges": [
{
"from": "2017-12-06T00:00:00.000",
"to": "2017-12-06T16:00:00.000"
},
{
"from": "2017-12-07T00:00:00.000",
"to": "2017-12-07T23:59:59.999"
}
]
},
"aggs": {
"my_cnt": {
"value_count": {
"field": "created_time"
}
}
}
},
"totalcount": {
"stats_bucket": {
"buckets_path": "dates>my_cnt"
}
}
}
}
}
}
您不能只生成totalcount
的原因是因为date_range
隐式创建子存储桶并且管道聚合无法处理它(我会说这是Elasticsearch的错误)。< / p>
所以黑客就是将另一个子聚合添加到dates
:my_cnt
,它只计算存储桶中的文档数量。 (请注意,我在created_time
字段上使用了value_count
聚合,假设它存在于所有文档中并且只有一个值。)
给出这样的文件集:
{"code":"1234","created_time":"2017-12-06T01:00:00"}
{"code":"1234","created_time":"2017-12-06T17:00:00"}
{"code":"1234","created_time":"2017-12-07T01:00:00"}
{"code":"1234","created_time":"2017-12-06T02:00:00"}
{"code":"1235","created_time":"2017-12-07T18:00:00"}
{"code":"1234","created_time":"2017-12-07T18:00:00"}
汇总的结果将是:
"aggregations": {
"codes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1234",
"doc_count": 5,
"dates": {
"buckets": [
{
"key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
"from": 1512518400000,
"from_as_string": "2017-12-06T00:00:00.000Z",
"to": 1512576000000,
"to_as_string": "2017-12-06T16:00:00.000Z",
"doc_count": 2,
"my_cnt": {
"value": 2
}
},
{
"key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
"from": 1512604800000,
"from_as_string": "2017-12-07T00:00:00.000Z",
"to": 1512691199999,
"to_as_string": "2017-12-07T23:59:59.999Z",
"doc_count": 2,
"my_cnt": {
"value": 2
}
}
]
},
"totalcount": {
"count": 2,
"min": 2,
"max": 2,
"avg": 2,
"sum": 4
}
},
{
"key": "1235",
"doc_count": 1,
"dates": {
"buckets": [
{
"key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
"from": 1512518400000,
"from_as_string": "2017-12-06T00:00:00.000Z",
"to": 1512576000000,
"to_as_string": "2017-12-06T16:00:00.000Z",
"doc_count": 0,
"my_cnt": {
"value": 0
}
},
{
"key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
"from": 1512604800000,
"from_as_string": "2017-12-07T00:00:00.000Z",
"to": 1512691199999,
"to_as_string": "2017-12-07T23:59:59.999Z",
"doc_count": 1,
"my_cnt": {
"value": 1
}
}
]
},
"totalcount": {
"count": 1,
"min": 1,
"max": 1,
"avg": 1,
"sum": 1
}
}
]
}
}
所需的值低于totalcount.sum
。
正如我已经说过的,这仅在假设created_time is always present and is exactly one
成立时才有效。如果在不同的情况下,date_range
聚合下的字段将具有多个值(例如update_time
以指示文档的所有更新),则sum将不再等于匹配文档的实际数量(如果这些日期重叠)。
在这种情况下,您可以随时使用filter
聚合并使用range
查询。
希望有所帮助!