过滤Elasticsearch中每个组的最新文档

时间:2016-10-27 07:11:54

标签: elasticsearch

假设以下文档在Elasticsearch中编入索引:

{student: "Chloe", date: "2016-10-27", grade: "A"}
{student: "Oliver", date: "2016-10-27", grade: "F"}
{student: "Chloe", date: "2016-10-26", grade: "B"}
{student: "Chloe", date: "2016-10-25", grade: "F"}
{student: "Oliver", date: "2016-10-25", grade: "A"}

我可以使用热门点击聚合来获取最新成绩的学生列表:

{student: "Chloe", date: "2016-10-27", grade: "A"}
{student: "Oliver", date: "2016-10-27", grade: "F"}

但我怎样才能得到最新成绩为" F" (只有姓名为#34的学生; Oliver"在这个特定的例子中)?例如,预期结果是:

{student: "Oliver", date: "2016-10-27", grade: "F"}

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

您可以使用bucket selector aggregation仅ES 2.x )执行此操作。我基本上将每个学生的最长日期与他们获得F等级(filtering)时的最大日期进行比较,并且只保留两个日期相同的结果。如果您愿意,可以删除top hits aggregation,只是在那里获取学生失败的特定记录。

{
  "size": 0,
  "aggs": {
    "group_by_students": {
      "terms": {
        "field": "student"
      },
      "aggs": {
        "only_f_grade_bucket": {
          "filter": {
            "term": {
              "grade": "F"
            }
          },
          "aggs": {
            "latest_date": {
              "max": {
                "field": "date"
              }
            },
            "top_hit":{
              "top_hits": {
                "size": 1
              }
            }
          }
        },
        "max_date": {
          "max": {
            "field": "date"
          }
        },
        "latest_failure": {
          "bucket_selector": {
            "buckets_path": {
              "failed_date": "only_f_grade_bucket.latest_date",
              "max_date": "max_date"
            },
            "script": "failed_date == max_date"
          }
        }
      }
    }
  }
}