我通过流式传输大型(20GB)csv文件。
date,ip,dev_type,env,time,cpu_usage
2015-11-09,10.241.121.172,M2,production,11:01,8
2015-11-09,10.241.121.172,M2,production,11:02,9
2015-11-09,10.241.121.243,C1,preproduction,11:01,4
2015-11-09,10.241.121.243,C1,preproduction,11:02,8
2015-11-10,10.241.121.172,M2,production,11:01,3
2015-11-10,10.241.121.172,M2,production,11:02,9
2015-11-10,10.241.121.243,C1,preproduction,11:01,4
2015-11-10,10.241.121.243,C1,preproduction,11:02,8
并导入elasticheaseh作为流动格式
{
"_index": "cpuusage",
"_type": "logs",
"_id": "AVFOkMS7Q4jUWMFNfSrZ",
"_score": 1,
"_source": {
"date": "2015-11-10",
"ip": "10.241.121.172",
"dev_type": "M2",
"env": "production",
"time": "11:02",
"cpu_usage": "9"
},
"fields": {
"date": [
1447113600000
]
}
}
...
所以当我在每天发现每个ip的cpu_usage的最大值时,我怎么能输出所有字段(date,ip,dev_type,env,cpu_usage)
curl -XGET localhost:9200/cpuusage/_search?pretty -d '{
"size": 0,
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"cpu_usage" : { "max" : { "field" : "cpu_usage" } }
}
}
}
}
}
}'
--- ---切
----output ----
"aggregations" : {
"events_by_date" : {
"buckets" : [ {
"key_as_string" : "2015-11-09T00:00:00.000Z",
"key" : 1447027200000,
"doc_count" : 4,
"genders" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "10.241.121.172",
"doc_count" : 2,
"cpu_usage" : {
"value" : 9.0
}
}, {
"key" : "10.241.121.243",
"doc_count" : 2,
"cpu_usage" : {
"value" : 8.0
}
} ]
}
},
答案 0 :(得分:6)
您可以使用top hits aggregation
执行此操作试试这个
{
"size": 0,
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs": {
"genders": {
"terms": {
"field": "ip",
"size": 100000,
"order": {
"_count": "asc"
}
},
"aggs": {
"cpu_usage": {
"max": {
"field": "cpu_usage"
}
},
"include_source": {
"top_hits": {
"size": 1,
"_source": {
"include": [
"date", "ip", "dev_type", "env", "cpu_usage"
]
}
}
}
}
}
}
}
}
}
这有帮助吗?