Elasticsearch,根据同级嵌套字段进行合计

时间:2020-01-23 04:25:35

标签: elasticsearch

Elasticsearch v7.5

你好,美好的一天!

我们有两个名为 socialmedia Influencers

的索引

样本内容:

社交媒体

{
    '_id' : 1001,
    'title' : "Title 1",
    'smp_id' : 1,
    "latest" : [
        {
          "soc_mm_score" : "5",
        }
    ]
},
{
    '_id' : 1002,
    'title' : "Title 2",
    'smp_id' : 2,
    "latest" : [
        {
          "soc_mm_score" : "10",
        }
    ]
},
{
    '_id' : 1003,
    'title' : "Title 3",
    'smp_id' : 3,
    "latest" : [
        {
          "soc_mm_score" : "35",
        }
    ]
},
{
    '_id' : 1004,
    'title' : "Title 4",
    'smp_id' : 2,
    "latest" : [
        {
          "soc_mm_score" : "30",
        }
    ]
}

///省略了其他一些字段

影响者

{
    '_id' : 1,
    'name' : "John",
    'smp_id' : 1
},
{
    '_id' : 2,
    'name' : "Peter",
    'smp_id' : 2
},
{
    '_id' : 3,
    'name' : "Mark",
    'smp_id' : 3
}

现在,我有一个简单的查询,可确定 socialmedia 索引中的哪些文档具有最大的 latest.soc_mm_score 值,并显示它们 smp_id

确定的相应影响者
GET socialmedia/_search
{
  "size": 0,
  "_source": "latest", 
  "query": {
    "match_all": {}
  }, 
  "aggs": {
    "LATEST": {
      "nested": {
        "path": "latest"
      },
      "aggs": {
        "MM_SCORE": {
          "terms": {
            "field": "latest.soc_mm_score",
            "order": {
              "_key": "desc"
            },
            "size": 3
          },
          "aggs": {
            "REVERSE": {
              "reverse_nested": {},
              "aggs": {
                "SMP_ID": {
                  "top_hits": {
                    "_source": ["smp_id"], 
                    "size": 1
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

样品输出

"aggregations" : {
    "LATEST" : {
      "doc_count" : //omitted,
      "MM_SCORE" : {
        "doc_count_error_upper_bound" : //omitted,
        "sum_other_doc_count" : //omitted,
        "buckets" : [
          {
            "key" : 35,
            "doc_count" : 1,
            "REVERSE" : {
              "doc_count" : 1,
              "SMP_ID" : {
                "hits" : {
                  "total" : {
                    "value" : 1,
                    "relation" : "eq"
                  },
                  "max_score" : 1.0,
                  "hits" : [
                    {
                      "_index" : "socialmedia",
                      "_type" : "index",
                      "_id" : "1003",
                      "_score" : 1.0,
                      "_source" : {
                        "smp_id" : "3"
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : 30,
            "doc_count" : 1,
            "REVERSE" : {
              "doc_count" : 1,
              "SMP_ID" : {
                "hits" : {
                  "total" : {
                    "value" : 1,
                    "relation" : "eq"
                  },
                  "max_score" : 1.0,
                  "hits" : [
                    {
                      "_index" : "socialmedia",
                      "_type" : "index",
                      "_id" : "1004",
                      "_score" : 1.0,
                      "_source" : {
                        "smp_id" : "2"
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : 10,
            "doc_count" : 1,
            "REVERSE" : {
              "doc_count" : 1,
              "SMP_ID" : {
                "hits" : {
                  "total" : {
                    "value" : 1,
                    "relation" : "eq"
                  },
                  "max_score" : 1.0,
                  "hits" : [
                    {
                      "_index" : "socialmedia",
                      "_type" : "index",
                      "_id" : "1002",
                      "_score" : 1.0,
                      "_source" : {
                        "smp_id" : "2"
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }

通过上面的查询,我能够成功显示哪些文档具有最高的 latest.soc_mm_score

上面的示例输出仅显示 DOCUMENTS ,根据 latest.soc_mm_score ,表明与它们相关的影响者(aka smp_id)是TOP INFLUENCERS。 strong>

理想情况下,只需使用此aggs查询

"terms" : {
    "field" : "smp_id"
}

根据 doc_count

描述了哪些影响者是顶部的概念

现在,根据 latest.soc_mm_score 显示字词查询会显示 TOP DOCUMENTS

"terms" : {
    "field" : "latest.soc_mm_score"
}

目标目标

我想根据 socialmedia 索引中的 latest.soc_mm_count 显示 TOP INFLUENCERS 。如果Elasticsearch可以根据唯一的smp_id计算所有文档的位置,那么ES是否有办法汇总所有 latest.soc_mm_score 值并将其用作条款

我上面的目标应该输出以下内容:

  • smp_id 2成为最有影响力的人,因为他有2个帖子(soc_mm_score分别为30和10),加上这些帖子后他的排名为40 soc_mm_score
  • smp_id 3作为第二位杰出影响者,他拥有1个职位,得分为35 soc_mm_score
  • smp_id 1作为第三名杰出影响者,他有1个帖子,还有5个soc_mm_score

是否有适当的查询来实现此目标?

1 个答案:

答案 0 :(得分:0)

最后!找到答案!!!

"aggs": {
    "INFS": {
      "terms": {
        "field": "smp_id.keyword",
        "order": {
          "LATEST > SUM_SVALUE": "desc"
        }
      },
      "aggs": {
        "LATEST": {
          "nested": {
            "path": "latest"
          },
          "aggs": {
            "SUM_SVALUE": {
              "sum" : {
                "field": "latest.soc_mm_score"
              }
            }
          }
        }
      }
    }
}

显示以下示例:

FINAL