我正在运行Elasticsearch v1.2.2。
我有这样的文件集合
[
{ id: 1, source: { host: 'test.localhost', timestamp: 1407937236, loading_time: 2.841917 } },
{ id: 2, source: { host: 'test.localhost', timestamp: 1407937262, loading_time: 2.009191 } },
{ id: 3, source: { host: 'test.localhost', timestamp: 1407937322, loading_time: 2.084986 } },
{ id: 4, source: { host: 'test.localhost', timestamp: 1407937382, loading_time: 2.869245 } },
{ id: 5, source: { host: 'test.localhost', timestamp: 1407937442, loading_time: 2.559648 } },
...
]
(基本上所有时间我都会对内部主机进行测试,这会给我带来加载时间。)
现在我想生成一个概述图:
由于我是Elasticsearch的新手,我甚至不知道是否所有这些都是可能的。
在MySQL中,这看起来像这样
SELECT (FLOOR(`timestamp` / 1800) * 1800) AS timestamp
MAX(`loading_time`) AS loading_time
FROM `elasticsearch_table`
GROUP BY (FLOOR(`timestamp` / 1800) * 1800)
WHERE `host` = 'test.localhost'
AND `timestamp` BETWEEN 1407937236 AND 1407937442
ORDER BY `timestamp` ASC
(不确定这个MySQL查询是否有效,但它应该让您了解我想要实现的目标。)
答案 0 :(得分:2)
用于此类计算的功能在Elasticsearch中称为aggregations。
下面是一些适合您每个步骤需要的聚合:
棘手的部分是正确嵌套不同的聚合。您应该对一个小数据集进行一些测试来检查它。
尝试这样的事情:
GET test/collection/_search?search_type=count
{
"aggs": {
"filter_by_host":{
"filter": {
"and": {
"filters": [
{"term": {"host": "test.localhost"}},
{"range": {"timestamp": {
"from": 1407937230000,
"to": 1407937400000
}}}
]
}
},
"aggs": {
"date": {
"date_histogram": {
"field": "timestamp",
"interval": "2m"
},
"aggs": {
"max_loading_time": {
"max" : {"field" : "loading_time" }}
}
}
}
}
}
}
}
间隔时间仅为2分钟,选择范围边界只是为了从数据集中排除第五个文档,以便有效地过滤。
唯一缺少的部分是排序:您无法对计数结果进行排序。
请求输出:
{
...
"aggregations": {
"filter_by_host": {
"doc_count": 4,
"date": {
"buckets": [
{
"key_as_string": "2014-08-13T13:40:00.000Z",
"key": 1407937200000,
"doc_count": 2,
"max_loading_time": {
"value": 2.841917
}
},
{
"key_as_string": "2014-08-13T13:42:00.000Z",
"key": 1407937320000,
"doc_count": 2,
"max_loading_time": {
"value": 2.869245
}
}
]
}
}
}
}