如何在date_histogram聚合下对当天每个唯一ID的每个Lastest Record进行TopHit聚合?

时间:2021-04-22 08:39:45

标签: elasticsearch

我有一组“10K id”和一个文档类型,如 (在实际情况下,我的文档数量为 500K,因此我将其简化如下)

{"id":"Peter","sales":12679, "time": "timestamp": "2021-04-22 13:03:46.972"}
{"id":"Peter","sales":12375, "time": "timestamp": "2021-04-21 13:03:46.972"}
{"id":"Peter","sales":32124, "time": "timestamp": "2021-04-20 17:03:46.972"}
{"id":"Peter","sales":12472, "time": "timestamp": "2021-04-20 13:03:46.972"}
{"id":"Peter","sales":42679, "time": "timestamp": "2021-04-18 14:03:46.972"}
{"id":"Peter","sales":12379, "time": "timestamp": "2021-04-18 13:03:46.972"}
....
{"id":"John","sales":2256679, "time": "timestamp": "2021-04-2 13:03:46.972"}
{"id":"John","sales" 752375, "time":  "timestamp": "2021-04-1 13:03:46.972"}
{"id":"John","sales":85124, "time":   "timestamp": "2021-04-10 17:03:46.972"}
{"id":"John","sales":1472, "time":    "timestamp": "2021-04-10 13:03:46.972"}
{"id":"John","sales":4279, "time":    "timestamp": "2021-04-18 14:03:46.972"}
{"id":"John","sales":2379, "time":    "timestamp": "2021-04-18 13:03:46.972"}
....

我想做一个查询来执行以下任务:

  1. 查找每天“每个id”的最新记录 和
  2. 计算每个“id”在任何一天中没有“doc_count”的次数

使用日期直方图 + Top Hit agg + Uniquness 查找当天的每个“ID”最新销售额,并检查其中任何一个 DIDNT 在任何一天都有 DOC 计数, 我尝试了很多查询,但没有一个返回我想要的结果,

如下:

{
        "size": 0,
        "sort": {"timestamp": "desc"},
        "query": {
            "bool": {
                "must":
                 {
                     "terms": {
                         "id": ["Peter","John"]
                     }
                 }
            }
        },
        "aggs": {
            "sales_over_time": {
                "date_histogram": {
                    "field": "timestamp",
                    "calendar_interval": "1d"
                },
                 "aggs": {
                    "id": {
                    "terms": {
                        "field": "id.keyword"
                        }
                    }
                 }
            }
        }
}

返回类似的东西

 {
                    "key": 1615852800000,
                    "doc_count": 6,
                    "id": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "624232532",
                                "doc_count": 4
                            },
                            {
                                "key": "656625970",
                                "doc_count": 2
                            }
                        ]
                    }
                },

在此查询之后,我仍然需要检查其中一些条件是否在当天没有出现 doc_count,

为了让另一个通过每个 iD 获得当天的最新记录, 我试试,

{
        "size": 0,
        "sort": {"timestamp": "desc"},
        "query": {
            "bool": {
                "must":
                 {
                     "terms": {
                         "oneNetDevieId": [656625970,624232532,624232499]
                     }
                 }
            }
        },
        "aggs": {
            "sales_over_time": {
                "date_histogram": {
                    "field": "timestamp",
                    "calendar_interval": "1d"
                },
                "aggs": {
                    "name": {
                    "terms": { "field": "oneNetDevieId.keyword" },
                    "aggs": {
                        "latest_comment": {
                        "top_hits": {
                            "sort": [ {"timestamp": { "order": "desc" } } ],
                            "size": 1
                            }
                        }
                        }
                    }
                    }
                }
        }
}

如何对每天的每个ID进行求和?

0 个答案:

没有答案