如何按日期时间字段或Elasticsearch中的日期部分进行分组

时间:2015-03-06 09:19:14

标签: elasticsearch group-by

我使用elasticsearch来存储和检索数据。

curl http://localhost:9200/test/test -X POST -H "Content-type: application/json" -d '{"id":1, "created_at": "2015-03-02T12:00:00", "name": "test1"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":2, "created_at": "2015-03-03T12:00:00", "name": "test2"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":3, "created_at": "2015-03-03T12:00:00", "name": "test3"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":3, "created_at": "2015-03-03T12:01:00", "name": "test3"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":3, "created_at": "2015-03-03T12:02:00", "name": "test3"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":4, "created_at": "2015-03-02T12:00:00", "name": "test4"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":5, "created_at": "2015-03-02T12:00:00", "name": "test5"}'
curl http://localhost:9200/test/test/ -X POST -H "Content-type: application/json" -d '{"id":6, "created_at": "2015-03-03T12:00:00", "name": "test6"}'

当我尝试按created_at分组时,它可以正常工作。

curl http://localhost:9200/test/test/_search -X POST -d '{"size": "0", "aggs": {"group_by_created_at":{"terms":{"field": "created_at"}}}}' | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   570  100   490  100    80  69900  11412 --:--:-- --:--:-- --:--:-- 81666
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "aggregations": {
        "group_by_created_at": {
            "buckets": [
                {
                    "doc_count": 3,
                    "key": 1425297600000,
                    "key_as_string": "2015-03-02"
                },
                {
                    "doc_count": 5,
                    "key": 1425384000000,
                    "key_as_string": "2015-03-03"
                },
                {
                    "doc_count": 1,
                    "key": 1425384060000,
                    "key_as_string": "2015-03-03T12:01:00.000Z"
                },
                {
                    "doc_count": 1,
                    "key": 1425384120000,
                    "key_as_string": "2015-03-03T12:02:00.000Z"
                }
            ]
        }
    },
    "hits": {
        "hits": [],
        "max_score": 0.0,
        "total": 8
    },
    "timed_out": false,
    "took": 3
}

在上面的示例中,3条记录来自日期2015-03-03,我想要计算一下。

输出就像。

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "aggregations": {
        "group_by_created_at": {
            "buckets": [
                {
                    "doc_count": 3,
                    "key": 1425297600000,
                    "key_as_string": "2015-03-02"
                },
                {
                    "doc_count": 5,
                    "key": 1425384000000,
                    "key_as_string": "2015-03-03"
                }
            ]
        }
    },
    "hits": {
        "hits": [],
        "max_score": 0.0,
        "total": 8
    },
    "timed_out": false,
    "took": 3
}

我尝试用range进行灌溉。

curl http://localhost:9200/test/test/_search -X POST -d '{"size": "0", "aggs": {"group_by_created_at":{"range":{"field": "created_at", "ranges": [{"gte": "2015-03-02T00:00:00", "lte": "2015-03-02T23:59:59"}, {"gte": "2015-03-03T00:00:00", "lte": "2015-03-03T23:59:59"}]}}}}' | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   446  100   230  100   216  37581  35294 --:--:-- --:--:-- --:--:-- 38333
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "aggregations": {
        "group_by_created_at": {
            "buckets": [
                {
                    "doc_count": 8,
                    "key": "*-*"
                },
                {
                    "doc_count": 8,
                    "key": "*-*"
                }
            ]
        }
    },
    "hits": {
        "hits": [],
        "max_score": 0.0,
        "total": 8
    },
    "timed_out": false,
    "took": 2
}

但它显示了两个桶中的所有8个文档。如果我在过滤查询中使用相同的存储桶,则其工作正常。

curl http://localhost:9200/test/test/_search -X POST -d '{"query": {"filtered": {"filter":{"range":{"created_at" : {"gte": "2015-03-03T00:00:00", "lte": "2015-03-03T23:59:59"}}}}}}}' | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   994  100   870  100   124   110k  16105 --:--:-- --:--:-- --:--:--  106k
{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "mJs0WKiPTByQ6dLwJnKO8Q",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "created_at": "2015-03-03T12:00:00",
                    "id": 2,
                    "name": "test2"
                },
                "_type": "test"
            },
            {
                "_id": "49a3pQX2TYa_KV029c0NLQ",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "created_at": "2015-03-03T12:02:00",
                    "id": 3,
                    "name": "test3"
                },
                "_type": "test"
            },
            {
                "_id": "qWtAgCwSR_CTKsV1ibYVMg",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "created_at": "2015-03-03T12:01:00",
                    "id": 3,
                    "name": "test3"
                },
                "_type": "test"
            },
            {
                "_id": "VoxSH6tXQmuugOVOmmrD2g",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "created_at": "2015-03-03T12:00:00",
                    "id": 6,
                    "name": "test6"
                },
                "_type": "test"
            },
            {
                "_id": "oQmTxr5YRFaa3q7bvFOQLg",
                "_index": "test",
                "_score": 1.0,
                "_source": {
                    "created_at": "2015-03-03T12:00:00",
                    "id": 3,
                    "name": "test3"
                },
                "_type": "test"
            }
        ],
        "max_score": 1.0,
        "total": 5
    },
    "timed_out": false,
    "took": 2
}

我遗失了一些东西,我不知道:(

1 个答案:

答案 0 :(得分:2)

有一个date_histogram聚合,它将在任何给定的时间间隔内分组。要按日期分组,您可以使用:

"date_histogram":{
    "field" : "created_at",
    "interval" : "1d"
}