如何有效地聚合这些数据

时间:2017-03-10 00:49:41

标签: mongodb mongoose aggregation-framework

我希望汇总这些数据:

完整的JSON:https://paste.ubuntu.com/24147839/

样品:

{
"_id": "58c1b957f6d187a57dd4a458",
"stats": {
  "2017": {
    "3": {
      "9": {
        "6": {
          "49": {
            "sum": {
              "clicks": 1,
              "cost": 0.01
            }
          },
          "sum": {
            "clicks": 1,
            "cost": 0.01
          }
        },
        "sum": {
          "clicks": 1,
          "cost": 0.01
        }
      }
    }
  }
}

我想得到的结果是:(这只是一个样本,每小时一个)

 "6": { clicks: 686 , cost: 3.05399999999995 }" 

我从$stats.2017.3.9.6获取了 6 ,点击次数为$stats.2017.3.9.6.sum.clicks,而cost$stats.2017.3.9.6.sum.cost的总和

应该使用什么聚合查询来获得此结果?

提前感谢您的时间。

1 个答案:

答案 0 :(得分:2)

目前尚不清楚您是否希望在特定日期进行每小时汇总(您的样本数据全部为2017-03-09,但我们假设您确实希望按小时汇总这个特定的日期来自代表很多天的大型集合。这里是使用任何最新版本的聚合管道:

db.coll.aggregate([
  {$match:{"stats.2017.3.9":{$exists:true}}},
  {$project:{hours:[
      {hour:"00", sum:"$stats.2017.3.9.0.sum"}, 
      {hour:"01", sum:"$stats.2017.3.9.1.sum"},
      {hour:"02", sum:"$stats.2017.3.9.2.sum"}, 
      {hour:"03", sum:"$stats.2017.3.9.3.sum"}, 
      {hour:"04", sum:"$stats.2017.3.9.4.sum"}, 
      {hour:"05", sum:"$stats.2017.3.9.5.sum"}, 
      {hour:"06", sum:"$stats.2017.3.9.6.sum"}, 
      {hour:"07", sum:"$stats.2017.3.9.7.sum"}, 
      {hour:"08", sum:"$stats.2017.3.9.8.sum"}, 
      {hour:"09", sum:"$stats.2017.3.9.9.sum"}, 
      {hour:"10", sum:"$stats.2017.3.9.10.sum"}, 
      {hour:"11", sum:"$stats.2017.3.9.11.sum"}, 
      {hour:"12", sum:"$stats.2017.3.9.12.sum"},   
      {hour:"13", sum:"$stats.2017.3.9.13.sum"},   
      {hour:"14", sum:"$stats.2017.3.9.14.sum"},   
      {hour:"15", sum:"$stats.2017.3.9.15.sum"},   
      {hour:"16", sum:"$stats.2017.3.9.16.sum"},   
      {hour:"17", sum:"$stats.2017.3.9.17.sum"},   
      {hour:"18", sum:"$stats.2017.3.9.18.sum"},   
      {hour:"19", sum:"$stats.2017.3.9.19.sum"},   
      {hour:"20", sum:"$stats.2017.3.9.20.sum"},   
      {hour:"21", sum:"$stats.2017.3.9.21.sum"},   
      {hour:"22", sum:"$stats.2017.3.9.22.sum"},   
      {hour:"23", sum:"$stats.2017.3.9.23.sum"}
  ]}}, 
  {$unwind:"$hours"},
  {$group: {
      _id    : "$hours.hour", 
      clicks : {$sum:"$hours.sum.clicks"},  
      cost   : {$sum:"$hours.sum.cost"}
  }}, 
  {$sort:{_id:1}}
])

这是第一次过滤到包含" stats.2017.3.9"中的数据的记录,然后它创建一个带有小时数组的新文档,其中每小时它会投影为该小时收集的指标。如果文档中没有这样的小时,则在分组期间它将为空并被忽略。在解开小时数组后,我们按小时汇总点击次数和费用。

使用您的样本数据,结果是:

{ "_id" : "00", "clicks" : 93, "cost" : 0.419 }
{ "_id" : "01", "clicks" : 95, "cost" : 0.43 }
{ "_id" : "02", "clicks" : 86, "cost" : 0.427 }
{ "_id" : "03", "clicks" : 81, "cost" : 0.301 }
{ "_id" : "04", "clicks" : 92, "cost" : 0.423 }
{ "_id" : "05", "clicks" : 76, "cost" : 0.352 }
{ "_id" : "06", "clicks" : 91, "cost" : 0.397 }
{ "_id" : "07", "clicks" : 84, "cost" : 0.396 }
{ "_id" : "08", "clicks" : 95, "cost" : 0.353 }
{ "_id" : "09", "clicks" : 78, "cost" : 0.325 }
{ "_id" : "10", "clicks" : 100, "cost" : 0.40900000000000003 }
{ "_id" : "11", "clicks" : 96, "cost" : 0.405 }
{ "_id" : "12", "clicks" : 65, "cost" : 0.319 }
{ "_id" : "13", "clicks" : 90, "cost" : 0.395 }
{ "_id" : "14", "clicks" : 82, "cost" : 0.331 }
{ "_id" : "15", "clicks" : 85, "cost" : 0.38 }
{ "_id" : "16", "clicks" : 97, "cost" : 0.424 }
{ "_id" : "17", "clicks" : 27, "cost" : 0.125 }
{ "_id" : "18", "clicks" : 0, "cost" : 0 }
{ "_id" : "19", "clicks" : 0, "cost" : 0 }
{ "_id" : "20", "clicks" : 0, "cost" : 0 }
{ "_id" : "21", "clicks" : 0, "cost" : 0 }
{ "_id" : "22", "clicks" : 0, "cost" : 0 }
{ "_id" : "23", "clicks" : 0, "cost" : 0 }