Question

我是弹性搜索领域的新手。我正在学习并尝试检查它是否符合我的需求。

现在我正在学习elasticsearch中的聚合，我编写了以下python脚本，将一些时间序列数据摄取到elasticsearch中。

每隔5秒我创建一条新消息：

时间戳（ISO8601格式）
计数器
0到100之间的随机数

对于每一个新的一天，我都会创建一个以logs_Y-m-D作为索引名称的新索引。

我将使用Counter消息为_id索引每条消息。计数器会为每个新索引重置（每天）。

import csv
import time
import random
from datetime import datetime
from elasticsearch import Elasticsearch


class ElasticSearchDB:
    def __init__(self):
        self.es = Elasticsearch()

    def run(self):
        print("Started: {}".format(datetime.now().isoformat()))
        print("<Ctrl + c> for exit!")

        with open("..\\out\\logs.csv", "w", newline='') as f:
            writer = csv.writer(f)
            counter = 0
            try:
                while True:
                    i_name = "logs_" + time.strftime("%Y-%m-%d")
                    if not self.es.indices.exists([i_name]):
                        self.es.indices.create(i_name, ignore=400)
                        print("New index created: {}".format(i_name))
                        counter = 0

                    message = {"counter": counter, "@timestamp": datetime.now().isoformat(), "value": random.randint(0, 100)}
                    # Write to file
                    writer.writerow(message.values())
                    # Write to elasticsearch index
                    self.es.index(index=i_name, doc_type="logs", id=counter, body=message)
                    # Waste some time
                    time.sleep(5)
                    counter += 1

            except KeyboardInterrupt:
                print("Stopped: {}".format(datetime.now().isoformat()))


test_es = ElasticSearchDB()
test_es.run()

我运行此脚本 30分钟。接下来，使用Sense，我使用以下聚合查询来查询elasticsearch。

查询＃1：全部获取

查询＃2：汇总最近1小时的日志并为其生成统计信息。这显示了正确的结果。

查询＃3：汇总最近1分钟的日志并为其生成统计信息。汇总的文档数与1小时聚合中的文档数相同，理想情况下，它应仅汇总 12-13 log 。

查询＃4：汇总最近15秒的日志并为其生成统计信息。聚合的文档数与1小时聚合中的文档数相同，理想情况下，它应仅聚合 3-4个日志。

我的问题：

为什么elasticsearch无法理解1分15秒范围？
我理解映射，但我不知道如何写一个，所以我没有写一个，是什么导致了这个问题？

请帮忙！

查询＃1：全部获取

GET /_search

输出：

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 314,
      "max_score": 1,
      "hits": [
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "19",
            "_score": 1,
            "_source": {
               "counter": 19,
               "value": 62,
               "@timestamp": "2016-11-03T07:40:35.981395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "22",
            "_score": 1,
            "_source": {
               "counter": 22,
               "value": 95,
               "@timestamp": "2016-11-03T07:40:51.066395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "25",
            "_score": 1,
            "_source": {
               "counter": 25,
               "value": 18,
               "@timestamp": "2016-11-03T07:41:06.140395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "26",
            "_score": 1,
            "_source": {
               "counter": 26,
               "value": 58,
               "@timestamp": "2016-11-03T07:41:11.164395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "29",
            "_score": 1,
            "_source": {
               "counter": 29,
               "value": 73,
               "@timestamp": "2016-11-03T07:41:26.214395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "41",
            "_score": 1,
            "_source": {
               "counter": 41,
               "value": 59,
               "@timestamp": "2016-11-03T07:42:26.517395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "14",
            "_score": 1,
            "_source": {
               "counter": 14,
               "value": 9,
               "@timestamp": "2016-11-03T07:40:10.857395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "40",
            "_score": 1,
            "_source": {
               "counter": 40,
               "value": 9,
               "@timestamp": "2016-11-03T07:42:21.498395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "24",
            "_score": 1,
            "_source": {
               "counter": 24,
               "value": 41,
               "@timestamp": "2016-11-03T07:41:01.115395"
            }
         },
         {
            "_index": "logs_2016-11-03",
            "_type": "logs",
            "_id": "0",
            "_score": 1,
            "_source": {
               "counter": 0,
               "value": 79,
               "@timestamp": "2016-11-03T07:39:00.302395"
            }
         }
      ]
   }
}

查询＃2：获取最近1小时的统计信息。

GET /logs_2016-11-03/logs/_search?search_type=count
{
    "aggs": {
        "time_range": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "from": "now-1h"
                    }
                }
            },
            "aggs": {
                "just_stats": {
                    "stats": {
                        "field": "value"
                    }
                }
            }
        }
    }
}

输出：

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 366,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "time_range": {
         "doc_count": 366,
         "just_stats": {
            "count": 366,
            "min": 0,
            "max": 100,
            "avg": 53.17213114754098,
            "sum": 19461
         }
      }
   }
}

我得到了366个条目，这是正确的。

查询＃3：获取最近1分钟的统计信息。

GET /logs_2016-11-03/logs/_search?search_type=count
{
    "aggs": {
        "time_range": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "from": "now-1m"
                    }
                }
            },
            "aggs": {
                "just_stats": {
                    "stats": {
                        "field": "value"
                    }
                }
            }
        }
    }
}

输出：

{
   "took": 15,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 407,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "time_range": {
         "doc_count": 407,
         "just_stats": {
            "count": 407,
            "min": 0,
            "max": 100,
            "avg": 53.152334152334156,
            "sum": 21633
         }
      }
   }
}

这是错误的，它不能在最后1分钟内有407个条目，它应该只有12-13个日志。

查询＃4：获取最近15秒的统计信息。

GET /logs_2016-11-03/logs/_search?search_type=count
{
    "aggs": {
        "time_range": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "from": "now-15s"
                    }
                }
            },
            "aggs": {
                "just_stats": {
                    "stats": {
                        "field": "value"
                    }
                }
            }
        }
    }
}

输出：

{
   "took": 15,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 407,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "time_range": {
         "doc_count": 407,
         "just_stats": {
            "count": 407,
            "min": 0,
            "max": 100,
            "avg": 53.152334152334156,
            "sum": 21633
         }
      }
   }
}

这也是错误的，它在最后15秒内不能成为407个条目。它应该只有3-4个日志。

Answer 1

您的查询是正确的，但ES以UTC格式存储日期，因此您将获得所有内容。来自documentation

在JSON文档中，日期表示为字符串。 Elasticsearch 使用一组预配置格式来识别和解析这些格式将字符串转换为一个长值，表示自从该纪元开始的毫秒数的 UTC

您可以使用pytz模块并在ES中以UTC格式存储日期。请参阅this SO问题。

你也可以在范围查询中使用time_zone param，也最好聚合过滤结果，而不是获取所有结果，然后对所有结果进行过滤。

GET /logs_2016-11-03/logs/_search { "query": { "bool": { "filter": { "range": { "@timestamp": { "gte": "2016-11-03T07:15:35", <----- You would need absolute value "time_zone": "-01:00" <---- timezone setting } } } } }, "aggs": { "just_stats": { "stats": { "field": "value" } } }, "size": 0 }

您必须将所需的时间（ now-1m，now-15s ）转换为格式yyyy-MM-dd'T'HH:mm:ss for time_zone param，以便 now不受影响按time_zone ，所以最好的选择是将日期转换为UTC并存储它。

Elasticsearch：时间范围聚合未按预期工作

我的问题：

1 个答案: