我把一些数据放入ES。然后我使用copy_to功能在一个组中指定两个字段。这样做的原因是做多场聚合。以下是我的步骤。
curl -XPOST "localhost:9200/test?pretty" -d '{
"mappings" : {
"type9k" : {
"properties" : {
"SRC" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"DST" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"BITS" : { "type" : "long", "index" : "not_analyzed" },
"TIME" : { "type" : "long", "index" : "not_analyzed" }
}
}
}
}“
curl -X POST "http://localhost:9200/test/type9k/_bulk?pretty" -d '
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"tcp","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
'
我想在SRC上聚合,DST使用sum聚合器。然后返回前3个结果。将我的需求转换为SQL就像
SELECT sum(BITS) FROM table GROUP BY src,dst ORDER BY sum(BITS) DESC LIMIT 3.
我知道我可以使用如下脚本功能执行此操作:
curl -XPOST "localhost:9200/_all/_search?pretty" -d '
{
"_source": [ "SRC", "DST","BITS"],
"size":0,
"query": { "match_all": {} },
"aggs":
{
"SRC_DST":
{
"terms": {"script": "[doc.SRC.value, doc.DST.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BITS"} } }
}
}
}
'
我用脚本得到的结果如下:
"aggregations" : {
"SRC_DST" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [ {
"key" : "BJ-DL",
"doc_count" : 8,
"sum_bits" : {
"value" : 140.0
}
}, {
"key" : "DL-SH",
"doc_count" : 6,
"sum_bits" : {
"value" : 120.0
}
} ]
但我希望通过 copy_to 功能来实现。因为我认为编写脚本可能会花费太多时间。
答案 0 :(得分:0)
我不确定,但我猜您不需要copy_to
功能。如果我选择SQL query
,那么您可以使用terms aggregation和sum aggregation来完成您的要求
{
"size": 0,
"aggs": {
"unique_src": {
"terms": {
"field": "SRC",
"size": 10
},
"aggs": {
"unique_dst": {
"terms": {
"field": "DST",
"size": 3,
"order": {
"bits_sum": "desc"
}
},
"aggs": {
"bits_sum": {
"sum": {
"field": "BITS"
}
}
}
}
}
}
}
}
上面的查询给我这样的输出
"aggregations": {
"unique_src": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 6,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DL",
"doc_count": 4,
"bits_sum": {
"value": 70
}
},
{
"key": "SH",
"doc_count": 2,
"bits_sum": {
"value": 50
}
}
]
}
},
{
"key": "DL",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "SH",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
},
{
"key": "SH",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
}
]
}
}
希望这有帮助!