如何建立日期直方图查询,其中包括" new"属性

时间:2016-01-10 00:35:42

标签: elasticsearch histogram rollup nosql

我正在从设备收集数据,我希望了解新设备何时上线。文件格式如下:

{
  "device_id": "ue-0000"
}

我可以通过使用嵌套术语聚合进行日期直方图聚合来查询通过时间桶查看活动设备,但我不知道如何表达"来自桶device_id所在的匹配的逻辑在索引的早期出现"。

这是我当前的疑问:

{
  "query": {
    "filtered": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2015/12/08",
            "lte": "2016/01/08"
          }
        }
      }
    }
  },
  "aggregations": {
    "over_time": {
      "aggregations": {
        "app_count": {
          "terms": {
            "field": "app"
          }
        }
      },
      "date_histogram": {
        "field": "timestamp",
        "interval": "day",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2015/12/08",
          "max": "2016/01/08"
        }
      }
    }
  }
}

我有这样的文档:

{
    "timestamp": "2015/12/15",
    "device_id": "1"
}
{
    "timestamp": "2015/12/16",
    "device_id": "2"
}
{
    "timestamp": "2015/12/20",
    "device_id": "1"
}

我想回复一下:

{
  "aggregations": {
    "over_time": {
      "buckets": [
        {
          "key_as_string":"2015/12/15 00:00:00",
          "key":1449532800000,
          "doc_count":1,
          "new_devices":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[{"device_id": "1"}]}
        },
        {
          "key_as_string":"2015/12/16 00:00:00",
          "key":1449532800000,
          "doc_count":1,
          "new_devices":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[{"device_id": "2"}]}
        },
        // [[ SNIP ]]
        {
          "key_as_string":"2015/12/20 00:00:00",
          "key":1449532800000,
          "doc_count":0, // there are no new device_ids on this date
          "new_devices":{
            "doc_count_error_upper_bound":0,
            "sum_other_doc_count":0,
            "buckets":[]}
        }
      ]
    }
  }
}

1 个答案:

答案 0 :(得分:3)

我认为您需要在terms aggregation上再添加一个timestamp,这样只会为您提供最新唯一设备。试试这样的事情

{
  "query": {
    "filtered": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2015/12/08",
            "lte": "2016/01/08"
          }
        }
      }
    }
  },
  "size": 0,
  "aggs": {
    "unique_device": {
      "terms": {
        "field": "device_id",
        "size": 10
      },
      "aggs": {
        "unique_date": {
          "terms": {
            "field": "timestamp",
            "size": 1,                   
            "order": {
              "_term": "asc"
            }
          },
          "aggs": {
            "latest_device": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "day",
                "min_doc_count": 0,
                "extended_bounds": {
                  "min": "2015/12/08",
                  "max": "2016/01/08"
                }
              }
            }
          }
        }
      }
    }
  }
}

size中的ordertimestamp aggregation只会为date histogram提供新设备。

这有帮助吗?