elasticsearch:我如何组合一个字段并平均总计?

时间:2016-07-23 08:20:34

标签: elasticsearch

如果我定义了一个模式:

"mappings": {
    "sales": {
        "properties": {
            "gender": { "type": "byte" },
            "age":    { "type": "byte" },
            "amount": { "type": "integer" },
            "dow":    { "type": "byte" },
            "day_of": { "type": "date" },
        }
    }
}

并向ES添加1000个销售凭证,其中数据为0表示男性,1表示女性,道琼斯指数为1表示星期一,2表示星期二等。

如何获得如下结果:

gender 0: average amount of sales
gender 1: average amount of sales

dow monday: average amount of sales
dow tues: average amount of sales
dow wed: average amount of sales
dow thurs: average amount of sales
dow friday: average amount of sales

dow monday AND age 18-24: average amount of sales
dow tues AND age 18-24 AND female: average amount of sales
dow wed AND age 18-24: average amount of sales
dow thurs AND age 18-24: average amount of sales
dow friday AND age 18-24: average amount of sales

2 个答案:

答案 0 :(得分:1)

其中每一个都很直接,但你真的会问几个不同的问题。

没有必要像你一样明确地调出每个值(虽然技术上没有任何错误)。相反,你可以问一下"更简单"问题并允许查询范围控制您甚至看到的内容。

  

性别0:平均销售额   性别1:平均销售额

这可以成为一个更简单的问题:

  

性别N:平均销售额

{
  "size": 0,
  "aggs": {
    "group_by_gender": {
      "terms": {
        "field": "gender"
      },
      "aggs": {
        "avg_sales": {
          "avg" :{
            "field": "amount"
          }
        }
      }
    }
  }
}
  星期一:平均销售额   道具:平均销售额   道服结算:平均销售额   dow thurs:平均销售量   周五:平均销售额

这可以成为一个更简单的问题:

  

dow N,除周六或周日:平均销售额

假设dow == 0是星期日而dow == 6是星期六:

{
  "size": 0,
  "query": {
    "bool" : {
      "must_not": [
        {
          "terms": {
            "dow": [0, 6]
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_dow": {
      "terms": {
        "field": "dow",
        "size": 5
      },
      "aggs": {
        "avg_sales": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

最后,最后一个只是为该问题添加了另一个过滤器:

  

和18-24岁和女性

我认为AND female是为所有人复制的,因为这就是你的答案:

{
  "size": 0,
  "query": {
    "bool" : {
      "must_not": [
        {
          "terms": {
            "dow": [0, 6]
          }
        }
      ],
      "filter": [
        {
          "term": {
            "gender": 1
          }
        },
        {
          "range": {
            "age": {
              "gte": 18,
              "lte": 24
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_dow": {
      "terms": {
        "field": "dow",
        "size": 5
      },
      "aggs": {
        "avg_sales": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

您已经发现了stats汇总,但您只是要求平均值,因此使用更具体的avg汇总不会浪费时间执行您不需要的计算关心。

您还需要了解query context and the filter context之间的差异,以了解我使用filter而不是must的原因(基本上,过滤器可以缓存

答案 1 :(得分:0)

我认为这有效:

  "query": {
        "bool": {          
            "must": [
                { "match": {"gender":1} },
                { "range": {"age": {"gte": 18, "lte": 24}} }
            ]
        }
    },
"size": 0,
"aggs":{"monday" :{"filter":{"term":{"dow":1}}, "aggs":{"s":{"stats":{"field": "amount"}}}},
        "tuesday":{"filter":{"term":{"dow":2}}, "aggs":{"s":{"stats":{"field": "amount"}}}}}