Question

我正在运行Elasticsearch v1.2.2。

我有这样的文件集合

[
  { id: 1, source: { host: 'test.localhost', timestamp: 1407937236, loading_time: 2.841917 } },
  { id: 2, source: { host: 'test.localhost', timestamp: 1407937262, loading_time: 2.009191 } },
  { id: 3, source: { host: 'test.localhost', timestamp: 1407937322, loading_time: 2.084986 } },
  { id: 4, source: { host: 'test.localhost', timestamp: 1407937382, loading_time: 2.869245 } },
  { id: 5, source: { host: 'test.localhost', timestamp: 1407937442, loading_time: 2.559648 } },
  ...
]

（基本上所有时间我都会对内部主机进行测试，这会给我带来加载时间。）

现在我想生成一个概述图：

时间戳分组30分钟
返回（该分组的）最大loading_time
其中host是特定的
在特定时间戳范围之间
按时间戳排序

由于我是Elasticsearch的新手，我甚至不知道是否所有这些都是可能的。

在MySQL中，这看起来像这样

SELECT (FLOOR(`timestamp` / 1800) * 1800) AS timestamp
       MAX(`loading_time`) AS loading_time
FROM `elasticsearch_table`
GROUP BY (FLOOR(`timestamp` / 1800) * 1800)
WHERE `host` = 'test.localhost'
AND `timestamp` BETWEEN 1407937236 AND 1407937442
ORDER BY `timestamp` ASC

（不确定这个MySQL查询是否有效，但它应该让您了解我想要实现的目标。）

Answer 1

用于此类计算的功能在Elasticsearch中称为aggregations。

下面是一些适合您每个步骤需要的聚合：

时间戳分组30分钟=＆gt; date_histogram聚合，间隔为30米
返回（该分组的）最大loading_time =＆gt;加载时max聚合。
其中host是特定的=＆gt; filter聚合主机上的term filter 。
在特定时间戳范围=＆gt;之间另一个带有range过滤器的过滤器聚合。
按时间戳排序

棘手的部分是正确嵌套不同的聚合。您应该对一个小数据集进行一些测试来检查它。

尝试这样的事情：

GET test/collection/_search?search_type=count
{
  "aggs": {
    "filter_by_host":{
      "filter": {
        "and": {
          "filters": [
            {"term": {"host": "test.localhost"}},
            {"range": {"timestamp": {
              "from": 1407937230000,               
              "to": 1407937400000
            }}}
          ]
        }
      },
      "aggs": {
        "date": {
          "date_histogram": {
            "field": "timestamp",
            "interval": "2m"
          },
          "aggs": {
            "max_loading_time": {
             "max" : {"field" : "loading_time" }}
            }
          }
        }
      }
    }
  }
}

间隔时间仅为2分钟，选择范围边界只是为了从数据集中排除第五个文档，以便有效地过滤。

唯一缺少的部分是排序：您无法对计数结果进行排序。

请求输出：

{
    ...
   "aggregations": {
      "filter_by_host": {
         "doc_count": 4,
         "date": {
            "buckets": [
               {
                  "key_as_string": "2014-08-13T13:40:00.000Z",
                  "key": 1407937200000,
                  "doc_count": 2,
                  "max_loading_time": {
                     "value": 2.841917
                  }
               },
               {
                  "key_as_string": "2014-08-13T13:42:00.000Z",
                  "key": 1407937320000,
                  "doc_count": 2,
                  "max_loading_time": {
                     "value": 2.869245
                  }
               }
            ]
         }
      }
   }
}

按修改后的时间戳分组并返回Elasticsearch的最大值

1 个答案: