我试图避免不得不反复查询ElasticSearch以获取我需要的信息。
假设我有一个由城市中的事件组成的数据集。数据集中的文档可能如下所示:
{
city: 'Berlin',
event: 'Dance party',
date: '2017-04-15'
},
{
city: 'Seattle',
event: 'Wine tasting',
date: '2017-04-18'
},
{
city: 'Berlin',
event: 'Dance party,
date: '2017-04-21'
},
{
city: 'Hong Kong',
event: 'Theater',
date: '2017-04-25'
}...
现在说已知所有被跟踪城市的列表,我需要从每个城市获取最近的事件。因此,我需要能够在查询中提供一系列城市名称,这些内容与['Berlin', 'Hong Kong', 'Seattle']
一致,只返回最后三个事件。
我当前的查询只能通过重复运行大小为1,并在城市名称上进行完全匹配来完成此操作,如下所示:
{
size: 1,
body: {
sort: [
{'date': {'order': 'desc'}}
],
query: {
'match_phrase': {'city': 'Berlin'}
}
}
}
有没有办法编写脚本,以便我可以将整个城市列表传递到一个查询中,并且可以预测只获取 每个城市的最新条目?
修改
我的新脚本如下所示:
{
'query': {
'match_all': {}
},
'_source': ['city', 'event', 'date'],
'aggs': {
'cities': {
'terms': {
'field': 'city',
'size': 100
},
'aggs': {
'top_cities': {
'top_hits': {
'size': 1,
'_source': 'event',
'sort': {
'date': 'desc'
}
}
}
}
}
}
}
这看起来确实应该有效。但我仍然缺少我所知道的那些城市,其中一个出现了多次。
我在Node中使用elasticsearch-js包运行它。客户端以这种方式执行:
let client = new elasticSearch.Client(
{
"host": [
"host1:9200",
"host2:9200",
"host3:9200"
]
}
);
client.search(SEARCH_PARAMS)
.then(function (resp) {
console.log(JSON.stringify(resp));
});
以下是生成的JSON的(已清理)版本:
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 42,
"successful": 42,
"failed": 0
},
"hits": {
"total": 5685608,
"max_score": 1,
"hits": [{
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-U",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-X",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu489lVgqYk_9QxQb-a",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-b",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-d",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-f",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnN",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Seattle"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnP",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "New York"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu49AkKCe9swQD44WnY",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44Wnb",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}]
}
}
仔细检查后,由于某种原因,聚合没有添加到resp
对象。
答案 0 :(得分:1)
除了过滤查询中的城市之外,我建议在城市字段上使用terms
聚合,然后使用top_hits
子聚合来检索每个城市的最新事件:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"cities": {
"terms": {
"field": "city",
"size": 100
},
"aggs": {
"top_events": {
"top_hits": {
"size": 1,
"_source": "event",
"sort": {
"date": "desc"
}
}
}
}
}
}
}
答案 1 :(得分:0)
您可以使用Terms Query,通过所有这些城市,例如:
"query": {
"terms": {
"city": [
"BERLIN",
"RIO DE JANEIRO"
]
}
},
"size": 3,
"_source": "city",
"sort": [
{
"date": {
"order": "desc"
}
}
]
}