Question

我了解到搜索任务可以从Elasticsearch中的倒排索引中受益。

但是我不明白聚合任务如何从倒排索引中受益。例如，如果我们有以下文档

id, name,   gender, age, weight
1,  Tom,    m,      29,  100
2,  James,  m,      28,  120
3,  Lucy,   f,      27,  80
4,  Kevin,  m,      28,  150
5,  Jessica,f,      22,  100
....

如果我想获得年龄= 28的平均体重，则使用倒排索引进行Elasticsearch步骤应该是

1. get the doc list of age = 28 which may looks like [id=2, id=4 ...]
2. Read each doc to get the weight
3. Add the weight and divide by the number of records

对于步骤2而言，这似乎并不高效。由于磁盘上的doc位置不是连续的，因此Elasticsearch无法一次读取就加载数据，因此需要多次读取。

那么，为什么Elasticsearch在聚合方面可以具有如此好的性能？除了使用反向索引进行聚合外，它是否还使用其他数据结构？我的理解对汇总步骤有误吗？

Answer 1

这里的关键是要了解Lucene不会在磁盘上访问倒排索引和other relevant Lucene files，而是mapped into memory（不在堆上！）。

因此，无需过多讨论，基本上就是ES在搜索和聚合方面都实现出色性能的方式。

聚合任务如何从Elasticsearch中的倒排索引中受益

1 个答案: