Elasticsearch post_filter聚合查询

时间:2019-06-24 15:30:14

标签: elasticsearch

我对所有没有返回200个响应(在特定时间间隔内)的API都很感兴趣。

我基本上需要这个:

     select url from api_log
      except/minus 
     select url from api_log where status='200'

我正在尝试翻译成ES:

  1. 首先计算聚合。
     select url, status, count(*) from api_log
     group by url, status
  1. 从随后的结果中,筛选出所有具有状态为200的子项的记录

ES样本数据

{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:51.108945",
        "out_time": "2019-05-13T17:20:51.145549",
        "duration": 36.6041660308838,
        "status": "200",
        "url": "/api/myFirstAPI"
    }
}
,
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "2",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:57.915694",
        "out_time": "2019-05-13T17:20:57.941989",
        "duration": 26.2949466705322,
        "status": "403",
        "url": "/api/mySecondAPI"
    }
},
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "3",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:22:35.274372",
        "out_time": "2019-05-13T17:22:35.288944",
        "duration": 14.5719051361084,
        "status": "400",
        "url": "/api/myFirstAPI"
    }
}

对于上述数据,我希望结果网址为{'/ api / mySecondAPI'}。

仅用AGG进行请求/响应

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
    "field": "url.keyword"
      },
      "aggregations": {
    "status": {
      "terms": {
        "field": "status.keyword"
      }
    }
      }
    }
  }
}

对以上请求的答复

{
  "took" : 880,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "url" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 394668,
      "buckets" : [
        {
          "key" : "/api/myFirstRequest",
          "doc_count" : 1352845,
          "status" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "200",
                "doc_count" : 1187611
              },
              {
                "key" : "302",
                "doc_count" : 139932
              },
              {
                "key" : "401",
                "doc_count" : 22615
              },
              {
                "key" : "500",
                "doc_count" : 2250
              },
              {
                "key" : "403",
                "doc_count" : 437
              }
            ]
          }
        },
...
...
...

从上面我需要过滤掉所有不包含状态为“ 200”的子存储桶的存储桶(URL)

我走了这么远。看起来很近,但是很远。...似乎无法弄清楚类型字段中应该包含什么。

带过滤器的请求

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "page_name": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword"
          }
        }
      }
    }
  },
   "post_filter": {
      "bool": {
        "must_not": [
            {
                "has_child" : {
                    "type" : "?????",
                    "query" : {
                        "term" : {"status" : "200"}
                    }
                }
            }
        ]
      }
    }
}

示例输入(来自apache日志):

t1 /api/FirstAPI 200  <-- Eliminate First API completely
t2 /api/FirstAPI 400
t3 /api/FirstAPI 403
t4 /api/SecondAPI 403
t5 /api/SecondAPI 400
t6 /api/ThirdAPI 500
t7 /api/ThirdAPI 500
t8 /api/SecondAPI 200   <---Eliminate Second API completely
t9 /api/ThirdAPI 500
t10 /api/ThirdAPI 403

鉴于上述输入,我只希望在时间范围t1-t10中从未给出200响应的页面。

预期结果

因此,输出应为 / api / ThirdAPI

如果我先过滤掉200个,然后再应用Agg,我将获得全部三个API。那不是我想要的。

1 个答案:

答案 0 :(得分:0)

如果我理解正确,您只想从聚合中排除200。我没有在这里使用vowels = ['a', 'e', 'i', 'o', 'u'] res = "" for letter in word: if letter in vowels: continue res += letter return res 的理由。您可以使用术语汇总

Exclude or filter the status value in aggregations。这将计算所有post_filter响应并添加到200字段中,但会在聚合响应中排除存储桶,并且不会显示doc_count

200

替代:

根据您的输入,您似乎希望将POST /api_log/_search { "size": 0, "aggs": { "url": { "terms": { "field": "url.keyword" }, "aggregations": { "status": { "terms": { "field": "status.keyword", "exclude": "200" } } } } } } 作为结果集的一部分(因为您正在使用post_filter),但是如果没有,那么如果不是这种情况,这是另一种方法。汇总是在查询响应上完成的;因此,如果您使用bool query从结果集中排除200个桶,则不会有状态为200的存储桶。

200