在Elasticsearch中过滤聚合数据

时间:2018-04-17 06:47:05

标签: elasticsearch

在过去的两天里,我的团队负责解决从Elasticsearch DB(ES)查询数据的问题。我们的目的是通过ES中的字段获取聚合数据,并累积两个值。 如果我将它翻译成SQL查询,我们需要这样的东西:

SELECT MAX(FIELD1) AS F1, MAX(FIELD2) AS F2 FROM ES GROUP BY FIELD3 HAVING F1 = ‘SOME_TEXT’

请注意F1是文本字段。

我们现在发现的唯一解决方案是:

{
    "size": 0 ,
    "aggs": {
        "flowId": {
            "terms": {
                "field": "flowId.keyword"
            },
            "aggs" :{
                "scenario" : { "terms" : { "field" : "scnName.keyword" } },
                "max_time" : { "max" : { "field" : "inFlowTimeNsec" } },
                "sales_bucket_filter": {
                    "bucket_selector": {
                        "buckets_path": {
                            "totalSales": "scenario"
                        },
                        "script": "params.totalSales != null && params.totalSales == 'Test' "
                    }
                }
            }
        }
    }
}

我们遇到的问题是:

{
    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "aggregation_execution_exception",
            "reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.terms.StringTerms"
        }
    },
    "status": 503
}

据我所知,该问题已经提出:https://github.com/elastic/elasticsearch/issues/23874

没有bucket_selector部分的上述查询的输出如下所示:

{
    "took": 52,
    "timed_out": false,
    "_shards": {
        "total": 480,
        "successful": 480,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 15657901,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "flowId": {
            "doc_count_error_upper_bound": 4104,
            "sum_other_doc_count": 9829317,
            "buckets": [
                {
                    "key": "0_66718_31120bfd_39ae_4258_81e8_08abd89a81bf",
                    "doc_count": 107816,
                    "scenario": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "GetPop",
                                "doc_count": 12
                            }
                        ]
                    },
                    "max_time": {
                        "value": 121244876800
                    }
                },
                {
                    "key": "0_67116_31120bfd_39ae_4258_81e8_08abd89a81bf",
                    "doc_count": 107752,
                    "scenario": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "GetPop",
                                "doc_count": 12
                            }
                        ]
                    },
                    "max_time": {
                        "value": 120955101184
                    }
                },
…
}

问题是还有其他方法可以实现我们的需求吗?我的意思是我们需要过滤聚合数据的结果......

非常感谢, EG

0 个答案:

没有答案