我正在从设备收集数据,我希望了解新设备何时上线。文件格式如下:
{
"device_id": "ue-0000"
}
我可以通过使用嵌套术语聚合进行日期直方图聚合来查询通过时间桶查看活动设备,但我不知道如何表达"来自桶device_id
所在的匹配的逻辑在索引的早期出现"。
这是我当前的疑问:
{
"query": {
"filtered": {
"filter": {
"range": {
"timestamp": {
"gte": "2015/12/08",
"lte": "2016/01/08"
}
}
}
}
},
"aggregations": {
"over_time": {
"aggregations": {
"app_count": {
"terms": {
"field": "app"
}
}
},
"date_histogram": {
"field": "timestamp",
"interval": "day",
"min_doc_count": 0,
"extended_bounds": {
"min": "2015/12/08",
"max": "2016/01/08"
}
}
}
}
}
我有这样的文档:
{
"timestamp": "2015/12/15",
"device_id": "1"
}
{
"timestamp": "2015/12/16",
"device_id": "2"
}
{
"timestamp": "2015/12/20",
"device_id": "1"
}
我想回复一下:
{
"aggregations": {
"over_time": {
"buckets": [
{
"key_as_string":"2015/12/15 00:00:00",
"key":1449532800000,
"doc_count":1,
"new_devices":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{"device_id": "1"}]}
},
{
"key_as_string":"2015/12/16 00:00:00",
"key":1449532800000,
"doc_count":1,
"new_devices":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{"device_id": "2"}]}
},
// [[ SNIP ]]
{
"key_as_string":"2015/12/20 00:00:00",
"key":1449532800000,
"doc_count":0, // there are no new device_ids on this date
"new_devices":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[]}
}
]
}
}
}
答案 0 :(得分:3)
我认为您需要在terms aggregation
上再添加一个timestamp
,这样只会为您提供最新唯一设备。试试这样的事情
{
"query": {
"filtered": {
"filter": {
"range": {
"timestamp": {
"gte": "2015/12/08",
"lte": "2016/01/08"
}
}
}
}
},
"size": 0,
"aggs": {
"unique_device": {
"terms": {
"field": "device_id",
"size": 10
},
"aggs": {
"unique_date": {
"terms": {
"field": "timestamp",
"size": 1,
"order": {
"_term": "asc"
}
},
"aggs": {
"latest_device": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"min_doc_count": 0,
"extended_bounds": {
"min": "2015/12/08",
"max": "2016/01/08"
}
}
}
}
}
}
}
}
}
size
中的order
和timestamp aggregation
只会为date histogram
提供新设备。
这有帮助吗?