有没有办法使用elasticsearch返回每个匹配字段只有一个匹配?

时间:2017-05-11 23:05:36

标签: node.js elasticsearch

注意:已更新以包含NodeJS客户端详细信息。请参阅下面的编辑。

我试图避免不得不反复查询ElasticSearch以获取我需要的信息。

假设我有一个由城市中的事件组成的数据集。数据集中的文档可能如下所示:

{
    city: 'Berlin',
    event: 'Dance party',
    date: '2017-04-15'
},
{
    city: 'Seattle',
    event: 'Wine tasting',
    date: '2017-04-18'
},
{
    city: 'Berlin',
    event: 'Dance party,
    date: '2017-04-21'
},
{
    city: 'Hong Kong',
    event: 'Theater',
    date: '2017-04-25'
}...

现在说已知所有被跟踪城市的列表,我需要从每个城市获取最近的事件。因此,我需要能够在查询中提供一系列城市名称,这些内容与['Berlin', 'Hong Kong', 'Seattle']一致,只返回最后三个事件。

我当前的查询只能通过重复运行大小为1,并在城市名称上进行完全匹配来完成此操作,如下所示:

{
    size: 1,
    body: {
        sort: [
            {'date': {'order': 'desc'}}
        ],
        query: {
            'match_phrase': {'city': 'Berlin'}
        }
    }
}

有没有办法编写脚本,以便我可以将整个城市列表传递到一个查询中,并且可以预测只获取 每个城市的最新条目?

修改

我的新脚本如下所示:

{
    'query': {
        'match_all': {}
    },
    '_source': ['city', 'event', 'date'],
    'aggs': {
        'cities': {
            'terms': {
                'field': 'city',
                'size': 100
            },
            'aggs': {
                'top_cities': {
                    'top_hits': {
                        'size': 1,
                        '_source': 'event',
                        'sort': {
                            'date': 'desc'
                        }
                    }
                }
            }
        }
    }
}

这看起来确实应该有效。但我仍然缺少我所知道的那些城市,其中一个出现了多次。

我在Node中使用elasticsearch-js包运行它。客户端以这种方式执行:

let client = new elasticSearch.Client(
    {
        "host": [
            "host1:9200",
            "host2:9200",
            "host3:9200"
        ]
    }
);
client.search(SEARCH_PARAMS)
    .then(function (resp) {
        console.log(JSON.stringify(resp));
    });

以下是生成的JSON的(已清理)版本:

{
    "took": 77,
    "timed_out": false,
    "_shards": {
        "total": 42,
        "successful": 42,
        "failed": 0
    },
    "hits": {
        "total": 5685608,
        "max_score": 1,
        "hits": [{
            "_index": "sanitized",
            "_type": "sanitized",
            "_id": "AVu489lVgqYk_9QxQb-U",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-15",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized",
            "_id": "AVu489lVgqYk_9QxQb-X",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-15",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_1",
            "_id": "AVu489lVgqYk_9QxQb-a",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-b",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-d",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Hong Kong"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-f",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Hong Kong"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44WnN",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Seattle"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44WnP",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "New York"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_1",
            "_id": "AVu49AkKCe9swQD44WnY",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44Wnb",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }]
    }
}

仔细检查后,由于某种原因,聚合没有添加到resp对象。

2 个答案:

答案 0 :(得分:1)

除了过滤查询中的城市之外,我建议在城市字段上使用terms聚合,然后使用top_hits子聚合来检索每个城市的最新事件:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "cities": {
      "terms": {
        "field": "city",
        "size": 100
      },
      "aggs": {
        "top_events": {
          "top_hits": {
            "size": 1,
            "_source": "event",
            "sort": {
              "date": "desc"
            }
          }
        }
      }
    }
  }
}

答案 1 :(得分:0)

您可以使用Terms Query,通过所有这些城市,例如:

"query": {
    "terms": {
      "city": [
        "BERLIN",
        "RIO DE JANEIRO"
      ]
    }
  }, 
  "size": 3,
  "_source": "city",
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ]
}