Question

Elasticsearch v7.5

你好，美好的一天！

我们有两个名为 socialmedia 和 Influencers

的索引

样本内容：

社交媒体：

{
    '_id' : 1001,
    'title' : "Title 1",
    'smp_id' : 1,
},
{
    '_id' : 1002,
    'title' : "Title 2",
    'smp_id' : 2,
},
{
    '_id' : 1003,
    'title' : "Title 3",
    'smp_id' : 3,
}
//omitted other documents

影响者

{
    '_id' : 1,
    'name' : "John",
    'smp_id' : 1,
    'smp_score' : 5
},
{
    '_id' : 2,
    'name' : "Peter",
    'smp_id' : 2,
    'smp_score' : 10
},
{
    '_id' : 3,
    'name' : "Mark",
    'smp_id' : 3,
    'smp_score' : 15
}
//omitted other documents

现在，我有一个简单的查询，可确定哪个影响者在 socialmedia 索引中的文档最多

GET socialmedia/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  }, 
  "aggs": {
    "INFLUENCERS": {
      "terms": {
        "field": "smp_id.keyword"
        //smp_id is a **text** based field, that's why we have `.keyword` here
      }
    }
  }
}

样品输出：

"aggregations" : {
    "INFLUENCERS" : {
      "doc_count_error_upper_bound" : //omitted,
      "sum_other_doc_count" : //omitted,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 87258
        },
        {
          "key" : "2",
          "doc_count" : 36518
        },
        {
          "key" : "3",
          "doc_count" : 34838
        },
      ]
  }
}

目标：

我的查询能够根据他们在 socialmedia 索引中的 doc_count 个帖子对影响者进行排序，现在，有一种方法可以我们可以对 INFLUENCERS 聚合进行排序，还是可以根据其 SMP_SCORE 来对影响者进行分类？

有了这个主意， smp_id 3（马克）应该是第一个出现，因为他的 smp_score 为15

预先感谢您的帮助！

Answer 1

您正在寻找的是JOIN操作。请注意，除非按照this link中所述的方式对Elasticsearch进行建模，否则Elasticsearch不支持JOIN操作。

相反，一种非常简单的方法是对数据进行规范化，并将smp_score添加到socialmedia索引中，如下所示：

映射：

PUT socialmedia
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword":{
            "type":"keyword"
          }
        }
      },
      "smp_id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "smp_score": {
        "type": "float"
      }
    }
  }
}

您的ES查询将具有两个Terms Aggregation，如下所示：

请求查询：

POST socialmedia/_search
{
  "size": 0,
  "aggs": {
    "influencers_score_agg": {
      "terms": {
        "field": "smp_score",
        "order": { "_key": "desc" }
      },
      "aggs": {
        "influencers_id_agg": {
          "terms": {
            "field": "smp_id.keyword"
          }
        }
      }
    }
  }
}

基本上，我们首先在smp_score上进行聚合，然后引入子聚合以显示smp_id。

响应：

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_influencers_score" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 15.0,
          "doc_count" : 1,
          "influencers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "3",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : 10.0,
          "doc_count" : 1,
          "influencers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "2",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : 5.0,
          "doc_count" : 1,
          "influencers" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "1",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

花一些时间阅读上面的链接，但是，这需要您根据索引中提到的选项以不同的方式对索引进行建模。据我了解，我提供的解决方案就足够了。

Elasticsearch，根据同级字段对aggs进行排序，但索引不同

1 个答案:

映射：

请求查询：

响应：