我正在处理ElasticSearch中的特定查询。查询的目的是返回带有最新时间戳的所有唯一结果。 因此,仅作为背景知识,在elasticsearch DB中,每个具有不同时间戳的唯一字段“ x”可以有多个条目。我希望ES查询返回这些唯一字段x的最新时间戳。 因此数据看起来就像ES数据库中一样:
{"x" : "1", "time": 1536574915}
{"x" : "2", "time": 1536574919}
{"x" : "1", "time": 1536574815}
{"x" : "2", "time": 1536574819}
{"x" : "3", "time": 1536574915}
{"x" : "4", "time": 1536574915}
预期输出为
{"x" : "1", "time": 1536574915}
{"x" : "2", "time": 1536574919}
{"x" : "3", "time": 1536574915}
{"x" : "4", "time": 1536574915}
我当前使用的查询是:
{
"size": 0,
"query": {
"bool": {
"must": [],
"filter": {
"range": {
"time": {
"lte": "2019-11-16", Can give epoch conversion here
"format": "date_optional_time"
}
}
}
}
},
"aggs": {
"group_by": {
"terms": {
"field": "x"
},
"aggs": {
"resource": {
"terms": {
"field": "time",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"include_source": {
"top_hits": {
"from": 0,
"size": 1,
"_source": {}
}
}
}
}
}
}
}
}
上述查询返回的结果为
[
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoAgAAAAAAAAECFmtnNUY4dHFKUXVldXdQMkNSaE1femcAAAAAAAABAxZrZzVGOHRxSlF1ZXV3UDJDUmhNX3pn",
"took": 227,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 343533,
"max_score": 0.0,
"hits": [
{
}
]
},
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 343513,
"buckets": [
{ # here is the actual data.
}
]
}
}
},
{
#another scroll_id. Removed the data as its huge.
}
]
我的问题是,上述情况下独特的结果在哪里? 是在[hits] [hits]内还是在“集合”内?如果在聚合中,对于一百万条记录,聚合仅返回10个结果。如果我依赖每个滚动列表中的[hits] [hits],则结果是重复的。我试图理解,根据上面的查询约束,我可以在此结果的哪一部分中获取正确的唯一条目。还是查询格式错误或缺少某些参数。 感谢任何帮助。 谢谢。
答案 0 :(得分:0)
您的汇总不正确,因为您要检索每个x
和time
的热门匹配,而您的目标是检索每个x
的最新匹配。您需要按如下方式修改查询,即,您仅按x
进行汇总,而在top_hits
子汇总中,您可以通过减少time
来对文档进行排序,并且仅采用最后一个。
{
"size": 0,
"aggs": {
"group_by": {
"terms": {
"field": "x"
},
"aggs": {
"resource": {
"top_hits": {
"from": 0,
"size": 1,
"sort": {
"time": "desc"
},
"_source": {}
}
}
}
}
}
}
您要查找的文档位于每个存储分区的resource.hits.hits
部分中:
"aggregations" : {
"group_by" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 2,
"resource" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "times",
"_type" : "doc",
"_id" : "PZt7G2cBJos57mIu0oy-",
"_score" : null,
"_source" : {
"x" : "1",
"time" : 1536574915
},
"sort" : [
1536574915
]
}
]
}
}
},
{
"key" : "2",
"doc_count" : 2,
"resource" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "times",
"_type" : "doc",
"_id" : "Ppt7G2cBJos57mIu0oy-",
"_score" : null,
"_source" : {
"x" : "2",
"time" : 1536574919
},
"sort" : [
1536574919
]
}
]
}
}
},
{
"key" : "3",
"doc_count" : 1,
"resource" : {
"hits" : {
"total" : 1,
"max_score" : null,
"hits" : [
{
"_index" : "times",
"_type" : "doc",
"_id" : "QZt7G2cBJos57mIu0oy-",
"_score" : null,
"_source" : {
"x" : "3",
"time" : 1536574915
},
"sort" : [
1536574915
]
}
]
}
}
},
{
"key" : "4",
"doc_count" : 1,
"resource" : {
"hits" : {
"total" : 1,
"max_score" : null,
"hits" : [
{
"_index" : "times",
"_type" : "doc",
"_id" : "Qpt7G2cBJos57mIu0oy-",
"_score" : null,
"_source" : {
"x" : "4",
"time" : 1536574915
},
"sort" : [
1536574915
]
}
]
}
}
}
]
}
}