我的索引包含以下文档:
[
{
"name": "Marco",
"city_id": 45,
"city": "Rome"
},
{
"name": "John",
"city_id": 46,
"city": "London"
},
{
"name": "Ann",
"city_id": 47,
"city": "New York"
},
...
]
和汇总:
"aggs": {
"city": {
"terms": {
"field": "city"
}
}
}
这给了我这样的答复:
{
"aggregations": {
"city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 694,
"buckets": [
{
"key": "Rome",
"doc_count": 15126
},
{
"key": "London",
"doc_count": 11395
},
{
"key": "New York",
"doc_count": 14836
},
...
]
},
...
}
}
我的问题是我需要在我的聚合结果上加city_id
。我一直在阅读here我不能有多字段术语聚合,但我不需要聚合两个字段,而只需返回另一个字段,对于每个术语字段总是相同的(基本上是city / city_id对)。在不失去绩效的情况下实现这一目标的最佳方法是什么?
我可以使用city_with_id
,"Rome;45"
等值创建名为"London;46"
的字段,并通过此字段进行聚合。对我而言,它可以工作,因为我可以简单地将结果分割到我的后端并获得我需要的ID,但这是最好的方法吗?
答案 0 :(得分:2)
一种方法是使用top_hits并使用源过滤仅返回city_id
,如下例所示。
我不认为这会低得多
您可以尝试在索引上查看影响,然后再尝试在OP中指定的city_name_id
字段的方法。
示例:
post <index>/_search
{
"size" : 0,
"aggs": {
"city": {
"terms": {
"field": "city"
},
"aggs" : {
"id" : {
"top_hits" : {
"_source": {
"include": [
"city_id"
]
},
"size" : 1
}
}
}
}
}
}
结果:
{
"key": "London",
"doc_count": 2,
"id": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "2",
"_score": 1,
"_source": {
"city_id": 46
}
}
]
}
}
},
{
"key": "New York",
"doc_count": 1,
"id": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "3",
"_score": 1,
"_source": {
"city_id": 47
}
}
]
}
}
},
{
"key": "Rome",
"doc_count": 1,
"id": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "1",
"_score": 1,
"_source": {
"city_id": 45
}
}
]
}
}
}