我有以下json
[
{"firstname": "john", "lastname": "doe"},
{"firstname": "john", "lastname": "smith"},
{"firstname": "jane", "lastname": "smith"},
{"firstname": "jane", "lastname": "doe"},
{"firstname": "joe", "lastname": "smith"},
{"firstname": "joe", "lastname": "doe"},
{"firstname": "steve", "lastname": "smith"},
{"firstname": "jack", "lastname": "doe"}
]
我想计算重复的名字
重复计数3
不重复的名字计数
非重复计数2
我试图计算存储桶的数量,但是似乎要计算所有存储桶是重复的还是不重复的
GET mynames/_search
{
"aggs" : {
"name_count" : {
"terms" : {
"field" : "firstname.keyword",
"min_doc_count": 2
}
},
"count":{
"cardinality": {
"field": "firstname.keyword"
}
}
}
答案 0 :(得分:1)
我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。
重复
非重复
POST <your_index_name>/_search
{
"size":0,
"aggs":{
"duplicate_aggs":{
"terms":{
"field":"firstname.keyword",
"min_doc_count":2
}
},
"duplicate_bucketcount":{
"stats_bucket":{
"buckets_path":"duplicate_aggs._count"
}
},
"nonduplicate_aggs":{
"terms":{
"field":"firstname.keyword"
},
"aggs":{
"equal_one":{
"bucket_selector":{
"buckets_path":{
"count":"_count"
},
"script":"params.count == 1"
}
}
}
},
"nonduplicate_bucketcount":{
"sum_bucket":{
"buckets_path":"nonduplicate_aggs._count"
}
}
}
}
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"duplicate_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "jane",
"doc_count": 2
},
{
"key": "joe",
"doc_count": 2
},
{
"key": "john",
"doc_count": 2
}
]
},
"nonduplicate_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "jack",
"doc_count": 1
},
{
"key": "steve",
"doc_count": 1
}
]
},
"duplicate_bucketcount": {
"count": 3,
"min": 2,
"max": 2,
"avg": 2,
"sum": 6
},
"nonduplicate_bucketcount": {
"value": 2
}
}
}
请注意,在上述响应中,我们有一个duplicate_bucketcount.count
密钥,其值3
将会显示存储桶计数,该存储桶计数是重复的密钥数目。
让我知道是否有帮助!