如何计算elasticsearch中不同聚合中度量标准之间的差异

时间:2014-09-02 17:09:09

标签: elasticsearch aggregation

我想计算两个日期之间嵌套聚合的差异

更具体的是,可以根据以下请求/响应计算date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value之间的差异。弹性搜索v.1.0.1可以实现吗?

聚合查询请求如下所示:

 {
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "date": [
                  "2014-08-18 00:00:00.0",
                  "2014-08-15 00:00:00.0"
                ]
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "date_1": {
      "filter": {
        "terms": {
          "date": [
            "2014-08-18 00:00:00.0"
          ]
        }
      },
      "aggs": {
        "my_agg_1": {
          "terms": {
            "field": "field_1",
            "size": 2147483647,
            "order": {
              "_term": "desc"
            }
          },
          "aggs": {
            "my_agg_2": {
              "terms": {
                "field": "field_2",
                "size": 2147483647,
                "order": {
                  "_term": "desc"
                }
              },
              "aggs": {
                "my_agg_3": {
                  "sum": {
                    "field": "field_3"
                  }
                }
              }
            }
          }
        }
      }
    },
    "date_2": {
      "filter": {
        "terms": {
          "date": [
            "2014-08-15 00:00:00.0"
          ]
        }
      },
      "aggs": {
        "my_agg_1": {
          "terms": {
            "field": "field_1",
            "size": 2147483647,
            "order": {
              "_term": "desc"
            }
          },
          "aggs": {
            "my_agg_1": {
              "terms": {
                "field": "field_2",
                "size": 2147483647,
                "order": {
                  "_term": "desc"
                }
              },
              "aggs": {
                "my_agg_3": {
                  "sum": {
                    "field": "field_3"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

响应如下:

{
  "took": 236,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 1646,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "date_1": {
      "doc_count": 823,
      "field_1": {
        "buckets": [
          {
            "key": "field_1_key_1",
            "doc_count": 719,
            "field_2": {
              "buckets": [
                {
                  "key": "key_1",
                  "doc_count": 275,
                  "field_3": {
                    "value": 100
                  }
                }
              ]
            }
          }
        ]
      }
    },
    "date_2": {
      "doc_count": 823,
      "field_1": {
        "buckets": [
          {
            "key": "field_1_key_1",
            "doc_count": 719,
            "field_2": {
              "buckets": [
                {
                  "key": "key_1",
                  "doc_count": 275,
                  "field_3": {
                    "value": 80
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

谢谢。

3 个答案:

答案 0 :(得分:1)

两个聚合之间不允许算术运算 来自elasticsearch DSL的结果,甚至不使用脚本。 (至少版本1.1.1,至少我知道)

在处理aggs结果后,需要在客户端对这些操作进行处理。

<强>参考

elasticsearch aggregation to sort by ratio of aggregations

答案 1 :(得分:0)

在1.0.1中我找不到任何东西,但在1.4.2中你可以尝试scripted_metric聚合(仍在实验中)。

以下是scripted_metric documentation page

我对弹性搜索语法不满意,但我认为您的指标输入是:

init_script - 只为每个日期初始化一个累加器:

"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"

map_script - 测试文档的日期并添加到正确的累加器:

"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",

reduce_script - 累积来自各个分片的中间数据并返回最终结果:

"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) {  totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"

我不认为在这种情况下你需要一个combine_script

如果当然,如果你不能使用1.4.2,那么没有帮助: - )

答案 2 :(得分:0)

使用elasticsearch新版本(例如:5.6.9)是可能的:

{
  "size": 0,
    "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "filter": [
            {
              "range": {
                "date_created": {
                  "gte": "2018-06-16T00:00:00+02:00",
                  "lte": "2018-06-16T23:59:59+02:00"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "by_millisec": {
      "range" : {
        "script" : {
          "lang": "painless",
            "source": "doc['date_delivered'][0] - doc['date_created'][0]"
        },
        "ranges" : [
          { "key": "<1sec", "to": 1000.0 },
          { "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
          { "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
          { "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
          { "key": "1-2min", "from": 60000.0, "to": 120000.0 },
          { "key": "2-5min", "from": 120000.0, "to": 300000.0 },
          { "key": "5-10min", "from": 300000.0, "to": 600000.0 },
          { "key": ">10min", "from": 600000.0 }
        ]
      }
    }
  }
}