Elasticsearch:我应该使用哪种嵌套映射来在聚合结果

时间:2016-07-20 17:11:59

标签: java elasticsearch database-design indexing lucene

假设我有很多弹性文档如下:

{
        "_index": "f2016-07-17",
        "_type": "trkvjadsreqpxl.gif",
        "_id": "AVX2N3dl5siG6SyfyIjb",
        "_score": 1,
        "_source": {
          "time": "1468714676424",
          "meta": {
            "cb_id": 25681,
            "mt_id": 649,
            "c_id": 1592,
            "revenue": 2.5,
            "mt_name": "GMS-INAPP-EN-2.5",
            "c_description": "COULL-INAPP-EN-2.5",
            "domain": "wv.inner-active.mobi",
            "master_domain": "649###wv.inner-active.mobi",
            "child_domain": "1592###wv.inner-active.mobi",
            "combo_domain": "25681###wv.inner-active.mobi",
            "ip": "52.42.87.73"
          }
        }....
      }

我的目的是使用术语aggs'进行简单的直方图聚合,并将聚合结果插回到新的索引/结构中。

聚合是:

{
  "aggs": {
    "hour":{
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs":{
            "hour_m_tag":{
               "terms":{
                  "field":"meta.mt_id"
               }
            }
         }
    }
  }
} 

结果如预期:

"aggregations": {
    "hour": {
      "buckets": [
        {
          "key_as_string": "2016-07-17T00:00:00.000Z",
          "key": 1468713600000,
          "doc_count": 94411750,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 1485,
            "sum_other_doc_count": 30731646,
            "buckets": [
              {
                "key": 10,
                "doc_count": 10175501
              },
              {
                "key": 649,
                "doc_count": 200000
              }....
            ]
          }
        },
        {
          "key_as_string": "2016-07-17T01:00:00.000Z",
          "key": 1468717200000,
          "doc_count": 68738743,
          "hourly_m_tag": {
            "doc_count_error_upper_bound": 2115,
            "sum_other_doc_count": 22478590,
            "buckets": [
              {
                "key": 559,
                "doc_count": 8307018
              },
              {
                "key": 649,
                "doc_count" :100000
              }...

我的问题

我想解析没问题的结果,并将其存储回新索引

我应该在新索引上使用什么嵌套映射,以便稍后获取聚合数据。

预期的数据结构:

{
  "hour": [
    {
      "time": "00:00",
      "child_tag": {
        "300": 100,
        "310": 200
      },
      "master_tag": {
        "1000": 300,
         "1001": 400
        "1010": 400
      }
    },
    {
      "time": "01:00",
      "child_tag": {
        "300": 500,
        "310": 600
      },
      "master_tag": {
        "1000": 700,
        "1010": 800
      }
    }

  ]...
}

P.S

稍后聚合应该在master_tag / child_tag键上进行求和:在几小时之间。

例如:00:00-01:00之间的查询

{

      "child_tag": {
        "300": 600,//100+500
        "310": 800 //200+600
      },
      "master_tag": {
        "1000": 1000, //300+700
         "1001": 400
        "1010": 1200 //400+800
      }
    }

非常感谢!

1 个答案:

答案 0 :(得分:0)

根据您的评论和修改,我建议您每小时在新索引中存储一个文档,以便根据特定时间更轻松地查询文档。

我建议的映射如下:

PUT /agg_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "time": {
          "type": "date",
          "format": "HH:mm"
        },
        "child_tag": {
          "type": "nested"
        },
        "master_tag": {
          "type": "nested"
        }
      }
    }
  }
}

然后你可以像这样索引你的新文件:

PUT /agg_index/doc/1
{
  "time": "00:00",
  "child_tag": {
    "300": 100,
    "310": 200
  },
  "master_tag": {
    "1000": 300,
    "1001": 400,
    "1010": 400
  }
}

PUT /agg_index/doc/2
{
  "time": "01:00",
  "child_tag": {
    "300": 500,
    "310": 600
  },
  "master_tag": {
    "1000": 700,
    "1010": 800
  }
}

您将能够在嵌套的child_tagmaster_tag元素上查询文档并运行聚合。