使用Elasticsearch(aggs)进行分组,将字段连接到值列表

时间:2018-05-08 21:17:54

标签: elasticsearch elasticsearch-aggregation

我有一个包含多种类型的索引。每条记录中的数据包括"客户ID"等字段。和"设备名称"," url"等

Elasticsearch是v5.6.8。

我最喜欢的是每个"客户ID"和"设备名称"以及文档的_type的值。每个分组的单个文档应该有一个' url'值加入一个名为' urls'。

的字段

我尝试了以下操作,但它没有做我认为会做的事情,我不知道还有什么可以尝试:

GET _search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "_index": "safebrowsing"
          }
        },
        {
          "range": {
            "eventtime": {
              "gte": "now-5d/d"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "reported_to_client": true
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "Customer ID": {
      "terms": {
        "field": "Customer ID.keyword"
      },
      "aggs": {
        "Device Name": {
          "terms": {
            "field": "Device Name.keyword"
          },
          "aggs": {
            "documenttype": {
              "terms": {
                "field": "_type"
              },
              "aggs": {
                "urls": {
                  "terms": {
                    "script": "_doc['url'].values"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

这是我得到的错误:

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_per_minute] setting",
        "bytes_wanted": 0,
        "bytes_limit": 0
      },
      {
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
          "_doc['url'].values",
          "^---- HERE"
        ],
        "script": "_doc['url'].values",
        "lang": "painless"
      }
    ],
...etc

1 个答案:

答案 0 :(得分:0)

我想出了这一点......基本上我们必须做的是拥有一个名为top_hits的聚合类型,它返回每个更高级别聚合中的实际命中数(由“size”指示的数量)。

GET /_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"_index": "safebrowsing"}},
        {"range": {"eventtime": {"gte": "now-2d/d"}}}
      ],
      "must_not": [
        {"term": {"reported_to_client": true}}
      ]
    }
  },
  "aggs": {
    "Customer ID": {
      "terms": {
        "field": "Customer ID.keyword"
      },
      "aggs": {
        "Device Name": {
          "terms": {
            "field": "Device Name.keyword"
          },
          "aggs": {
            "thetype": {
              "terms": {
                "field": "_type"
              },
              "aggs": {
                "thedocs": {
                  "top_hits": {
                    "sort": [{"eventtime": {"order": "desc"}}],
                    "_source": {
                      "includes": [ "ip", "type", "eventtime", "url" ]
                    },
                    "size": 2
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

我称之为thedocs的聚合中的每次点击都是这样的:

{
  "_index": "safebrowsing",
  "_type": "SOCIAL_ENGINEERING",
  "_id": "7ffe641xxxyyydc3536189ce33d5dfb9",
  "_score": null,
  "_source": {
    "ip": "xxx.xxx.7.88",
    "eventtime": "2018-05-08T23:34:03-07:00",
    "type": "SOCIAL_ENGINEERING",
    "url": "http://xyz-domainname.tld/bankofwhatever/"
  },
  "sort": [
    1525847643000
  ]
}