我们的Elasticsearch中有一组日志,每组包含1-7个共享唯一ID的日志(名为transactionId)。每个组中的每个日志都有一个唯一的时间戳(eventTimestamp)。
例如:
{
"transactionId": "id111",
"eventTimestamp": "1505864112047",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
{
"transactionId": "id111",
"eventTimestamp": "1505864112051",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
{
"transactionId": "id222",
"eventTimestamp": "1505863719467",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
{
"transactionId": "id222",
"eventTimestamp": "1505863719478",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
我需要编写一个查询,返回特定日期范围内所有transactionIds的所有最新时间戳。
继续我的简单示例,查询的结果应该返回这些日志:
{
"transactionId": "id111",
"eventTimestamp": "1505864112051",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
{
"transactionId": "id222",
"eventTimestamp": "1505863719478",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
}
关于如何构建完成此任务的查询的任何想法?
答案 0 :(得分:1)
您可以获得所需的结果,而不是使用查询本身,而是使用terms aggregation和嵌套top hits aggregation的组合。
术语聚合负责构建存储桶,其中具有相同术语的所有项目都在同一个存储桶中。这可以根据transactionId
生成您的论坛。然后,顶部命中聚合是一个度量聚合,可以配置为根据给定的排序顺序返回桶的x顶部命中。这允许您检索具有每个存储桶的最大时间戳的日志事件。
假设您的样本数据的默认映射(其中字符串被索引为键(文本)和key.keyword(作为未分析的文本))此查询:
GET so-logs/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"eventTimestamp.keyword": {
"gte": 1500000000000,
"lte": 1507000000000
}
}
}
]
}
},
"aggs": {
"by_transaction_id": {
"terms": {
"field": "transactionId.keyword",
"size": 10
},
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": [
{
"eventTimestamp.keyword": {
"order": "desc"
}
}
]
}
}
}
}
}
}
将产生以下输出:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"by_transaction_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "id111",
"doc_count": 2,
"latest": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "so-logs",
"_type": "entry",
"_id": "AV6z9Yj4QYbhNp_FoXa1",
"_score": null,
"_source": {
"transactionId": "id111",
"eventTimestamp": "1505864112051",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
},
"sort": [
"1505864112051"
]
}
]
}
}
},
{
"key": "id222",
"doc_count": 2,
"latest": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "so-logs",
"_type": "entry",
"_id": "AV6z9ZlOQYbhNp_FoXa4",
"_score": null,
"_source": {
"transactionId": "id222",
"eventTimestamp": "1505863719478",
"otherfieldA": "fieldAvalue",
"otherfieldB": "fieldBvalue"
},
"sort": [
"1505863719478"
]
}
]
}
}
}
]
}
}
}
您可以根据查询中定义的聚合名称在聚合结果by_transaction_id.latest
内找到所需的结果。
请注意,聚合术语对返回的桶数量有限制,将其设置为> 10.000从性能角度来看可能不是一个聪明的想法。有关详细信息,请参阅the section on size
of the terms aggregation。如果你想处理大量不同的交易ID,我建议你做一些" top"按交易ID输入。
此外,您应该将eventTimestamp
字段切换为date
以获得更好的效果和a wider set of query possibilities。