Question

我正在尝试搜索和生成统计信息的两级数据结构。假设数据是帖子和评论。看起来像这样：

[
  {"title": "Post 1", "comments": [
    {"name": "Comment 1", "date": "2019-01-10", "character_count": 1000},
    {"name": "Comment 2", "date": "2019-01-11", "character_count": 2000},
    {"name": "Comment 3", "date": "2019-01-12", "character_count": 1500},
    {"name": "Comment 4", "date": "2019-01-13", "character_count": 2500},
    {"name": "Comment 5", "date": "2019-01-15", "character_count": 3000}
  ]},
  {"title": "Post 2", "comments": [
    {"name": "Comment 1", "date": "2019-01-10", "character_count": 400},
    {"name": "Comment 2", "date": "2019-01-13", "character_count": 500},
    {"name": "Comment 3", "date": "2019-01-15", "character_count": 4000},
  ]}
]

关于数据的几个注释：

每个“评论”都具有比character_count（可能是30）更多的属性。
每个“帖子”没有多少“评论”，很少超过200，通常少于30。
有成千上万的“帖子”。
在映射中，“注释”是一个嵌套对象。

进行查询时，我只对基于日期的每个“帖子”中的一个“评论”感兴趣。对于某个日期，我需要该日期之前的最新评论。我可以使用脚本字段获取此数据：

{
  "script_fields": {
    "relevant_comment": {
      "script": "sorted = _source.comments.findAll { it.date < report_date } . sort { a, b -> b.date <=> a.date }; return sorted ? sorted.first() : null;",
      "params": {
        "report_date": "2019-01-12"
      }
    }
  }
}

因此对于“ 2019-01-12”，“职位1”将获得{"name": "Comment 2", "date": "2019-01-11", "character_count": 2000}，而“职位2”将获得{"name": "Comment 1", "date": "2019-01-10", "character_count": 400}。

现在我需要汇总，但是我该怎么做呢？例如，我需要获取平均字符数。还是字符数在一定值以下的“帖子”数量？

有一个answer here，建议在其中将脚本放入聚合本身。在这种情况下，每次需要获取单个属性时，对列表进行排序和过滤似乎是一种浪费。

或者也许我什至不需要脚本字段并且有不同的解决方案？

在ElasticSearch中的脚本字段（具有许多属性的对象）上有效地聚合

0 个答案: