使用唯一ID从具有最新时间戳的组中返回具有唯一ID的日志

时间:2017-09-24 12:14:52

标签: elasticsearch elasticsearch-aggregation

我们的Elasticsearch中有一组日志,每组包含1-7个共享唯一ID的日志(名为transactionId)。每个组中的每个日志都有一个唯一的时间戳(eventTimestamp)。

例如:

{
  "transactionId": "id111",
  "eventTimestamp": "1505864112047",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

{
  "transactionId": "id111",
  "eventTimestamp": "1505864112051",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

{
  "transactionId": "id222",
  "eventTimestamp": "1505863719467",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

{
  "transactionId": "id222",
  "eventTimestamp": "1505863719478",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

我需要编写一个查询,返回特定日期范围内所有transactionIds的所有最新时间戳。

继续我的简单示例,查询的结果应该返回这些日志:

{
  "transactionId": "id111",
  "eventTimestamp": "1505864112051",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

{
  "transactionId": "id222",
  "eventTimestamp": "1505863719478",
  "otherfieldA": "fieldAvalue",
  "otherfieldB": "fieldBvalue"
}

关于如何构建完成此任务的查询的任何想法?

1 个答案:

答案 0 :(得分:1)

您可以获得所需的结果,而不是使用查询本身,而是使用terms aggregation和嵌套top hits aggregation的组合。

术语聚合负责构建存储桶,其中具有相同术语的所有项目都在同一个存储桶中。这可以根据transactionId生成您的论坛。然后,顶部命中聚合是一个度量聚合,可以配置为根据给定的排序顺序返回桶的x顶部命中。这允许您检索具有每个存储桶的最大时间戳的日志事件。

假设您的样本数据的默认映射(其中字符串被索引为键(文本)和key.keyword(作为未分析的文本))此查询:

GET so-logs/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "eventTimestamp.keyword": {
              "gte": 1500000000000,
              "lte": 1507000000000
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "by_transaction_id": {
      "terms": {
        "field": "transactionId.keyword",
        "size": 10
      },
      "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "eventTimestamp.keyword": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

将产生以下输出:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "by_transaction_id": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "id111",
          "doc_count": 2,
          "latest": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "so-logs",
                  "_type": "entry",
                  "_id": "AV6z9Yj4QYbhNp_FoXa1",
                  "_score": null,
                  "_source": {
                    "transactionId": "id111",
                    "eventTimestamp": "1505864112051",
                    "otherfieldA": "fieldAvalue",
                    "otherfieldB": "fieldBvalue"
                  },
                  "sort": [
                    "1505864112051"
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "id222",
          "doc_count": 2,
          "latest": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "so-logs",
                  "_type": "entry",
                  "_id": "AV6z9ZlOQYbhNp_FoXa4",
                  "_score": null,
                  "_source": {
                    "transactionId": "id222",
                    "eventTimestamp": "1505863719478",
                    "otherfieldA": "fieldAvalue",
                    "otherfieldB": "fieldBvalue"
                  },
                  "sort": [
                    "1505863719478"
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

您可以根据查询中定义的聚合名称在聚合结果by_transaction_id.latest内找到所需的结果。

请注意,聚合术语对返回的桶数量有限制,将其设置为> 10.000从性能角度来看可能不是一个聪明的想法。有关详细信息,请参阅the section on size of the terms aggregation。如果你想处理大量不同的交易ID,我建议你做一些" top"按交易ID输入。

此外,您应该将eventTimestamp字段切换为date以获得更好的效果和a wider set of query possibilities