聚合中PARTY_ID的计数应相同。 在一种情况下它是3000而另一种情况它是所有值的总和(2675 + 244 + 41 + 6 + 2 = 2950),它们不相等。 可能是什么原因?
GET /test/data/_search
{
"size": 0,
"aggs": {
"ASSET_CLASS": {
"terms": {
"field": "ASSET_CLASS_WORST"
},
"aggs": {
"ASSET_CLASS": {
"cardinality": {
"field": "PARTY_ID"
}
}
}
},
"Total count": {
"cardinality": {
"field": "PARTY_ID"
}
}
}
}
结果:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 51891,
"max_score": 0,
"hits": []
},
"aggregations": {
"Total count": {
"value": 3000
},
"ASSET_CLASS": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NPA",
"doc_count": 49252,
"ASSET_CLASS": {
"value": 2675
}
},
{
"key": "RESTRUCTURED",
"doc_count": 2275,
"ASSET_CLASS": {
"value": 244
}
},
{
"key": "SMA2",
"doc_count": 308,
"ASSET_CLASS": {
"value": 41
}
},
{
"key": "SMA1",
"doc_count": 42,
"ASSET_CLASS": {
"value": 6
}
},
{
"key": "SMA0",
"doc_count": 14,
"ASSET_CLASS": {
"value": 2
}
}
]
}
}
}
答案 0 :(得分:1)
documentation for cardinality aggregation的第一行是:
计算近似值的单值指标汇总 不同价值的数量。
(强调我的)
3000分中的10分误差远低于1%,所以这只是预料之中。
基数聚合使用enhanced版本的HyperLogLog演算,其中包含常量内存复杂度和O(N)时间复杂度等有趣特性。
如果您需要更精确的结果,请尝试更高的precision_threshold
参数设置。
GET /test/data/_search
{
"size": 0,
"aggs": {
"ASSET_CLASS": {
"terms": {
"field": "ASSET_CLASS_WORST"
},
"aggs": {
"ASSET_CLASS": {
"cardinality": {
"field": "PARTY_ID",
"precision_threshold": 10000
}
}
}
},
"Total count": {
"cardinality": {
"field": "PARTY_ID",
"precision_threshold": 10000
}
}
}
}