我的索引包含以下格式的文档:
{
"title": "Being John Malkovich",
"credits": {
"cast": [
{
"name": "Starring",
"value": "John Malkovich"
},
{
"name": "Supporting",
"value": "John Cusack"
}
],
"crew": [
{
"name": "Directed",
"value": "Spike Jonze"
},
{
"name": "Written",
"value": "Charlie Kauffman"
}
]
}
}
(这是一个人为的例子,但它遵循我公司使用的结构。)
获取表单回复所需的聚合条件(总共假设30个电影文档):
{
"number_of_documents_where_credits_field_exists": 30,
"top_3_occurances_in_all_the_documents_of_any_cast_or_crew_member": [
{
"value": "Tom Cruise",
"count": 10
},
{
"value": "Spike Jonze",
"count": 3
},
{
"value": "John Malkovich",
"count": 1
}
]
}
其中结果按降序排序。
我是ES的新手并且正在查看条款聚合但我不确定如何编写相应的:
{
"aggs":
"top_3_occurances_in_all_the_documents_of_any_cast_or_crew_member":
{
<NESTED FIELD> : <CAST OR CREW MEMBER>
}
}
其中 NESTED FIELD 需要采用嵌套字段,而不是像“title”这样的顶级字段。结果应显示最常出现的演员或工作人员姓名。
到目前为止,只要请求(使用REST端点)包含信用类型(比如“cast”)和子信用类型(比如“Starring”),我就可以创建存储桶并应用聚合。例如,要了解主演演员成员的最高出现次数,我有以下内容:
{
"aggregations" : {
"nestedagg" : {
"nested" : {
"path" : "credits.cast"
},
"aggregations" : {
"filteredagg" : {
"filter" : {
"term" : {
"credits.cast.name" : {
"value" : "Starring",
"boost" : 1.0
}
}
},
"aggregations" : {
"termsagg" : {
"terms" : {
"field" : "credits.cast.value",
"size" : 3,
"shard_size" : -1,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
}
]
}
}
}
}
}
}
}
}
给了我(假设只有20部电影有演员主演角色,其余的是没有主演的合影片)
{
"number_of_documents_where_credits.cast.Starring_exists": 20,
"top_3_occurances_in_the_above_20_documents_of_a_starring_cast_member": [
{
"value": "Tom Cruise",
"count": 10
},
{
"value": "Seth Rogen",
"count": 3
},
{
"value": "Adam Sandler",
"count": 1
}
]
}
基本上,我需要一目了然地提供最常见的演员或机组成员名称。如果客户端未填充信用类型和子信用类型,则这将是默认行为。