如何使用copy_to进行多字段聚合?

时间:2016-01-26 09:54:39

标签: elasticsearch

我把一些数据放入ES。然后我使用copy_to功能在一个组中指定两个字段。这样做的原因是做多场聚合。以下是我的步骤。

创建索引

curl -XPOST "localhost:9200/test?pretty" -d '{
"mappings" : {
    "type9k" : {
        "properties" : {
            "SRC" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
            "DST" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
            "BITS" : { "type" : "long", "index" : "not_analyzed" },
            "TIME" : { "type" : "long", "index" : "not_analyzed" }
        }
    }
}

}“

将数据放入ES

curl -X POST "http://localhost:9200/test/type9k/_bulk?pretty" -d '
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"tcp","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
'

行。问题

我想在SRC上聚合,DST使用sum聚合器。然后返回前3个结果。将我的需求转换为SQL就像

SELECT sum(BITS) FROM table GROUP BY src,dst ORDER BY sum(BITS) DESC LIMIT 3.

我知道我可以使用如下脚本功能执行此操作:

curl -XPOST "localhost:9200/_all/_search?pretty" -d '
{
  "_source": [ "SRC", "DST","BITS"],
  "size":0,
  "query": {  "match_all": {} },
  "aggs":
    {
      "SRC_DST": 
        {
          "terms": {"script": "[doc.SRC.value, doc.DST.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
          "aggs": { "sum_bits": { "sum": {"field": "BITS"} } }
        }
    }
}
'

我用脚本得到的结果如下:

"aggregations" : {
"SRC_DST" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 10,
  "buckets" : [ {
    "key" : "BJ-DL",
    "doc_count" : 8,
    "sum_bits" : {
      "value" : 140.0
    }
  }, {
    "key" : "DL-SH",
    "doc_count" : 6,
    "sum_bits" : {
      "value" : 120.0
    }
  } ]

但我希望通过 copy_to 功能来实现。因为我认为编写脚本可能会花费太多时间。

1 个答案:

答案 0 :(得分:0)

我不确定,但我猜您不需要copy_to功能。如果我选择SQL query,那么您可以使用terms aggregationsum aggregation来完成您的要求

{
  "size": 0,
  "aggs": {
    "unique_src": {
      "terms": {
        "field": "SRC",
        "size": 10
      },
      "aggs": {
        "unique_dst": {
          "terms": {
            "field": "DST",
            "size": 3,
            "order": {
              "bits_sum": "desc"
            }
          },
          "aggs": {
            "bits_sum": {
              "sum": {
                "field": "BITS"
              }
            }
          }
        }
      }
    }
  }
}

上面的查询给我这样的输出

"aggregations": {
      "unique_src": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "BJ",
               "doc_count": 6,
               "unique_dst": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": "DL",
                        "doc_count": 4,
                        "bits_sum": {
                           "value": 70
                        }
                     },
                     {
                        "key": "SH",
                        "doc_count": 2,
                        "bits_sum": {
                           "value": 50
                        }
                     }
                  ]
               }
            },
            {
               "key": "DL",
               "doc_count": 3,
               "unique_dst": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": "SH",
                        "doc_count": 3,
                        "bits_sum": {
                           "value": 60
                        }
                     }
                  ]
               }
            },
            {
               "key": "SH",
               "doc_count": 3,
               "unique_dst": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": "BJ",
                        "doc_count": 3,
                        "bits_sum": {
                           "value": 60
                        }
                     }
                  ]
               }
            }
         ]
      }
   }

希望这有帮助!