像GROUP BY和HAVING这样的SQL

时间:2017-10-24 10:39:32

标签: elasticsearch elasticsearch-5

我想得到满足一定条件的群体数量。在SQL术语中,我想在Elasticsearch中执行以下操作。

SELECT COUNT(*) FROM
(
   SELECT
    senderResellerId,
    SUM(requestAmountValue) AS t_amount
   FROM
    transactions
   GROUP BY
    senderResellerId
   HAVING
    t_amount > 10000 ) AS dum;

到目前为止,我可以通过term Aggsel对senderResellerId进行分组。但是当我应用过滤器时,它不能按预期工作。

弹性请求

{
  "aggregations": {
    "reseller_sale_sum": {
      "aggs": {
        "sales": {
          "aggregations": {
            "reseller_sale": {
              "sum": {
                "field": "requestAmountValue"
              }
            }
          }, 
          "filter": {
            "range": {
              "reseller_sale": { 
                "gte": 10000
              }
            }
          }
        }
      }, 
      "terms": {
        "field": "senderResellerId", 
        "order": {
          "sales>reseller_sale": "desc"
        }, 
        "size": 5
      }
    }
  }, 
  "ext": {}, 
  "query": {  "match_all": {} }, 
  "size": 0
}

实际响应

{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 150824,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "reseller_sale_sum" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 149609,
      "buckets" : [
        {
          "key" : "RES0000000004",
          "doc_count" : 8,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000005",
          "doc_count" : 39,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000006",
          "doc_count" : 57,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000007",
          "doc_count" : 134,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        }
          }
        }
      ]
    }
  }
}

从上面的响应可以看出,它返回了代理商,但 reseller_sale 聚合在结果中为零。

更多详情请见here

1 个答案:

答案 0 :(得分:11)

实施HAVING - 类似行为

您可以使用其中一个pipeline aggregations,即bucket selector aggregation。查询将如下所示:

POST my_index/tdrs/_search
{
   "aggregations": {
      "reseller_sale_sum": {
         "aggregations": {
            "sales": {
               "sum": {
                  "field": "requestAmountValue"
               }
            },
            "max_sales": {
               "bucket_selector": {
                  "buckets_path": {
                     "var1": "sales"
                  },
                  "script": "params.var1 > 10000"
               }
            }
         },
         "terms": {
            "field": "senderResellerId",
            "order": {
               "sales": "desc"
            },
            "size": 5
         }
      }
   },
   "size": 0
}

将以下文件放入索引后:

  "hits": [
     {
        "_index": "my_index",
        "_type": "tdrs",
        "_id": "AV9Yh5F-dSw48Z0DWDys",
        "_score": 1,
        "_source": {
           "requestAmountValue": 7000,
           "senderResellerId": "ID_1"
        }
     },
     {
        "_index": "my_index",
        "_type": "tdrs",
        "_id": "AV9Yh684dSw48Z0DWDyt",
        "_score": 1,
        "_source": {
           "requestAmountValue": 5000,
           "senderResellerId": "ID_1"
        }
     },
     {
        "_index": "my_index",
        "_type": "tdrs",
        "_id": "AV9Yh8TBdSw48Z0DWDyu",
        "_score": 1,
        "_source": {
           "requestAmountValue": 1000,
           "senderResellerId": "ID_2"
        }
     }
  ]

查询结果为:

"aggregations": {
      "reseller_sale_sum": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "ID_1",
               "doc_count": 2,
               "sales": {
                  "value": 12000
               }
            }
         ]
      }
   }

即。只有累计销售额为senderResellerId的{​​{1}}。

计算存储桶

要实现等效的>10000,可以使用bucket script aggregationsum bucket aggregation的组合。虽然似乎没有直接的方法来计算SELECT COUNT(*) FROM (... HAVING)实际选择的多少个桶,但我们可能会根据条件定义生成bucket_selectorbucket_script的{​​{1}},以及产生0的<{1}}:

1

输出将是:

sum_bucket

所需的存储桶数位于sum

重要注意事项

我必须指出两件事:

  1. 该功能属于实验性功能(从ES 5.6开始,它仍然是实验性的,但它已添加到2.0.0-beta1中。)
  2. 管道聚合应用于先前聚合的结果:
  3.   

    管道聚合对其他聚合产生的输出起作用而不是   从文档集中,将信息添加到输出树。

    这意味着POST my_index/tdrs/_search { "aggregations": { "reseller_sale_sum": { "aggregations": { "sales": { "sum": { "field": "requestAmountValue" } }, "max_sales": { "bucket_script": { "buckets_path": { "var1": "sales" }, "script": "if (params.var1 > 10000) { 1 } else { 0 }" } } }, "terms": { "field": "senderResellerId", "order": { "sales": "desc" } } }, "max_sales_stats": { "sum_bucket": { "buckets_path": "reseller_sale_sum>max_sales" } } }, "size": 0 } 聚合将应用于 "aggregations": { "reseller_sale_sum": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ ... ] }, "max_sales_stats": { "value": 1 } } max_sales_stats.value聚合的结果之后和之后。例如,如果bucket_selector聚合定义的terms senderResellerId senderResellerId size terms,则sum(sales) > 10000的{​​ID>} ,但仅限于出现在terms聚合输出中的那些。考虑使用排序和/或设置足够的size参数。

    这也适用于第二种情况COUNT() (... HAVING),它只计算实际存在于聚合输出中的那些存储桶。

    如果此查询太重或存储桶数量太大,请考虑denormalizing您的数据或将此总和直接存储在文档中,这样您就可以使用纯range查询来实现目标

    希望有所帮助!