Question

我在Elasticsearch（6.3）上遇到性能问题。我的索引中有1B文档，我需要对一小部分数据进行汇总。

我的索引看起来像

    "s-data": {
        "mappings": {
            "s-type": {
                "properties": {
                    "c": {
                        "type": "integer"
                    },
                    "r": {
                        "type": "keyword"
                    },
                    "s": {
                        "type": "integer"
                    },
                    "t": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

我的查询看起来像

{
    "query":{
        "bool":{
            "filter":[
                {"term":{"t": "foo"}},
                {"term":{"c": 1}},
                {"terms":{"r": ["foobar", "foobaz"]}},
                {"term":{"s": 3}}
            ]
        }
    },
    "aggs":{
        "recips":{
          "terms": {"field": "r"}
        }
    }
}

查询本身在15毫秒内运行，但是一旦我进行聚合，整个过程就会超时。我假设聚合针对整个1B doc数据集运行。如何使聚合仅针对查询结果运行？

Answer 1

您可以尝试filter aggregation：

{
    "aggs":{
        "recips_subset": {
           "filter":[
                {"term":{"t": "foo"}},
                {"term":{"c": 1}},
                {"terms":{"r": ["foobar", "foobaz"]}},
                {"term":{"s": 3}}
            ],
            "aggs": {
                "recips":{
                    "terms": {"field": "r"}
                }
            }
        }
    }
}

但是，这应该与原始查询具有相同的效果，因为聚合是在搜索请求的已执行查询/过滤器的上下文中执行的。因此，要找到真正的瓶颈，需要更多信息：

r字段的cardinality是什么？
您修改了size参数还是使用脚本而不是字段？

Elasticsearch按聚合分组并具有过滤性能

1 个答案: