在我的 elasticsearch (7.13) 索引中,我有以下数据集:
maid site_id date hour
m1 1300 2021-06-03 1
m1 1300 2021-06-03 2
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
我正在尝试从上表中获取每个日期和站点 ID 的唯一记录数。想要的结果是
maid site_id date count
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
对于每个 site_id,我有数百万的女仆,日期跨越两年。我在 maid 上使用带有 cardinality
的以下代码,假设它将返回唯一的 maid's。
GET /r_2332/_search
{
"size":0,
"aggs": {
"site_id": {
"terms": {
"field": "site_id",
"size":100,
"include": [
1171, 1048
]
},"aggs" : {
"bydate" : {
"range" : {
"field": "date","ranges" : [
{
"from": "2021-04-08",
"to": "2021-04-22"
}
]
},"aggs" : {
"rdate" : {
"terms" : {
"field":"date"
},"aggs" :{
"maids" : {
"cardinality": {
"field": "maid"
}
}
}
}
}
}
}
}
}
}
这仍然返回具有所有重复值的数据。我如何将 maid 字段包含在我的查询中,我可以在其中获取根据唯一 maid 值过滤的数据。
答案 0 :(得分:1)
如果您想获得基于 site_id
和 maid
的唯一文档,您可以将 multi terms aggregation 与 cardinality aggregation 一起使用
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
"1300",
"1301"
]
}
},
{
"range": {
"date": {
"gte": "2021-06-02",
"lte": "2021-06-03"
}
}
}
]
}
},
"aggs": {
"group_by": {
"multi_terms": {
"terms": [
{
"field": "site_id"
},
{
"field": "maid.keyword"
}
]
},
"aggs": {
"type_count": {
"cardinality": {
"field": "site_id"
}
}
}
}
}
}
搜索结果将是
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
1300,
"m1"
],
"key_as_string": "1300|m1",
"doc_count": 3,
"type_count": {
"value": 1 // note this
}
},
{
"key": [
1300,
"m2"
],
"key_as_string": "1300|m2",
"doc_count": 1,
"type_count": {
"value": 1 // note this
}
}
]
}