ElasticSearch在同一索引内联接数据

时间:2019-10-05 09:09:33

标签: elasticsearch kibana

我对ElasticSearch相当陌生,我正在同一索引中收集具有这种格式的一些应用程序日志

{
    "_index" : "app_logs",
    "_type" : "_doc",
    "_id" : "JVMYi20B0a2qSId4rt12",
    "_source" : {
      "username" : "mapred",
      "app_id" : "application_1569623930006_490200",
      "event_type" : "STARTED",
      "ts" : "2019-10-02T08:11:53Z"
}

我可以有不同的事件类型。在这种情况下,我对STARTEDFINISHED感兴趣。我想查询ES,以获取某一天启动的所有应用程序,并以其结束时间充实它们。基本上,我想创建几个开始/结束(结束也可能会丢失,但这很好)。

我已经意识到sql中的联接关系不能在ES中使用,我想知道是否可以利用其他功能来在一个查询中获得此结果。

编辑:这些是索引映射的详细信息

{ 
 “app_logs" : {
  "mappings" : {
   "_doc" : {
    "properties" : {
      "event_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      “app_id" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "ts" : {
        "type" : "date"
      },
      “event_type” : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      }
    }
  }}}}

1 个答案:

答案 0 :(得分:1)

我了解到的是,您希望整理与app_idstatus相同的STARTED以及FINISHED的文档列表。

我不认为Elasticsearch不是要执行JOIN操作。我的意思是可以,但是然后您必须按照link中所述设计文档。

您需要的是Aggregation query

下面是示例映射,文档,聚合查询以及显示的响应,它们实际上将帮助您获得所需的结果。

映射:

PUT mystatusindex
{
  "mappings": {
    "properties": {
      "username":{
        "type": "keyword"
      },
      "app_id":{
        "type": "keyword"
      },
      "event_type":{
        "type":"keyword"
      },
      "ts":{
        "type": "date"
      }
    }
  }
}

样本文件

POST mystatusindex/_doc/1
{
    "username" : "mapred",
    "app_id" : "application_1569623930006_490200",
    "event_type" : "STARTED",
    "ts" : "2019-10-02T08:11:53Z"
}

POST mystatusindex/_doc/2
{
    "username" : "mapred",
    "app_id" : "application_1569623930006_490200",
    "event_type" : "FINISHED",
    "ts" : "2019-10-02T08:12:53Z"
}

POST mystatusindex/_doc/3
{
    "username" : "mapred",
    "app_id" : "application_1569623930006_490201",
    "event_type" : "STARTED",
    "ts" : "2019-10-02T09:30:53Z"
}

POST mystatusindex/_doc/4
{
    "username" : "mapred",
    "app_id" : "application_1569623930006_490202",
    "event_type" : "STARTED",
    "ts" : "2019-10-02T09:45:53Z"
}

POST mystatusindex/_doc/5
{
    "username" : "mapred",
    "app_id" : "application_1569623930006_490202",
    "event_type" : "FINISHED",
    "ts" : "2019-10-02T09:45:53Z"
}

POST mystatusindex/_doc/6
{
  "username" : "mapred",
  "app_id" : "application_1569623930006_490203",
  "event_type" : "STARTED",
  "ts" : "2019-10-03T09:30:53Z"
}

POST mystatusindex/_doc/7
{
  "username" : "mapred",
  "app_id" : "application_1569623930006_490203",
  "event_type" : "FINISHED",
  "ts" : "2019-10-03T09:45:53Z"
}

查询:

POST mystatusindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "ts": {
              "gte": "2019-10-02T00:00:00Z",
              "lte": "2019-10-02T23:59:59Z"
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "event_type": "STARTED"
          }
        },
        {
          "match": {
            "event_type": "FINISHED"
          }
        }
      ]
    }
  },
  "aggs": {
    "application_IDs": {
      "terms": {
        "field": "app_id"
      },
      "aggs": {
        "ids": {
          "top_hits": {
            "size": 10,
            "_source": ["event_type", "app_id"],
            "sort": [
              { "event_type": { "order": "desc"}}
              ]
          }
        }
      }
    }
  }
}

请注意,我只使用Range Query进行过滤,因为您只想过滤该日期的文档,还添加了布尔should逻辑来基于STARTED和{ {1}}。

一旦有了文档,我就使用Terms AggregationTop Hits Aggregation来获得所需的结果。

结果

FINISHED

请注意,只有{ "took" : 12, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "application_IDs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "application_1569623930006_490200", <----- APP ID "doc_count" : 2, "ids" : { "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "mystatusindex", "_type" : "_doc", "_id" : "1", <--- Document with STARTED status "_score" : null, "_source" : { "event_type" : "STARTED", "app_id" : "application_1569623930006_490200" }, "sort" : [ "STARTED" ] }, { "_index" : "mystatusindex", "_type" : "_doc", "_id" : "2", <--- Document with FINISHED status "_score" : null, "_source" : { "event_type" : "FINISHED", "app_id" : "application_1569623930006_490200" }, "sort" : [ "FINISHED" ] } ] } } }, { "key" : "application_1569623930006_490202", "doc_count" : 2, "ids" : { "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "mystatusindex", "_type" : "_doc", "_id" : "4", "_score" : null, "_source" : { "event_type" : "STARTED", "app_id" : "application_1569623930006_490202" }, "sort" : [ "STARTED" ] }, { "_index" : "mystatusindex", "_type" : "_doc", "_id" : "5", "_score" : null, "_source" : { "event_type" : "FINISHED", "app_id" : "application_1569623930006_490202" }, "sort" : [ "FINISHED" ] } ] } } }, { "key" : "application_1569623930006_490201", "doc_count" : 1, "ids" : { "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "mystatusindex", "_type" : "_doc", "_id" : "3", "_score" : null, "_source" : { "event_type" : "STARTED", "app_id" : "application_1569623930006_490201" }, "sort" : [ "STARTED" ] } ] } } } ] } } } 的最后一个文档也会出现在汇总结果中。

更新的答案

STARTED

请注意我所做的更改。每当您需要精确匹配或要使用聚合时,就需要使用{ "size":0, "query":{ "bool":{ "must":[ { "range":{ "ts":{ "gte":"2019-10-02T00:00:00Z", "lte":"2019-10-02T23:59:59Z" } } } ], "should":[ { "term":{ "event_type.keyword":"STARTED" <----- Changed this } }, { "term":{ "event_type.keyword":"FINISHED" <----- Changed this } } ] } }, "aggs":{ "application_IDs":{ "terms":{ "field":"app_id.keyword" <----- Changed this }, "aggs":{ "ids":{ "top_hits":{ "size":10, "_source":[ "event_type", "app_id" ], "sort":[ { "event_type.keyword":{ <----- Changed this "order":"desc" } } ] } } } } } } 类型。

在您共享的映射中, 没有keyword字段,但有两个username字段 。我假设它只是一个人为错误,并且其中一个字段应该为event_type

现在,如果您仔细注意,字段username将具有一个event_type及其兄弟字段text。我刚刚修改了查询以使用关键字字段,并且在执行此操作时使用了Term Query

尝试一下,让我知道是否有帮助!