如何使用mongodb分组数据?

时间:2015-03-18 09:48:47

标签: mongodb mongodb-query aggregation-framework

我有15分钟的间隔数据。

[{
    "_id" : ObjectId("5500a5e6f37a84d0509526ba"),
    "runtimeMilliSeconds" : NumberLong("1426105802063"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 71.72000122070312,
        "currentMemoryUtilization" : 77.4000015258789
    }
}
{
    "_id" : ObjectId("5500a96af37a84d0509526f8"),
    "runtimeMilliSeconds" : NumberLong("1426106701622"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 70.30000305175781,
        "currentMemoryUtilization" : 77.4000015258789
    }
}
{
    "_id" : ObjectId("5500aceef37a84d050952739"),
    "runtimeMilliSeconds" : NumberLong("1426107601441"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 73.2300033569336,
        "currentMemoryUtilization" : 77.4000015258789
    }
}
{
    "_id" : ObjectId("5500b07ff37a84d050952776"),
    "runtimeMilliSeconds" : NumberLong("1426108501342"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 60.61000061035156,
        "currentMemoryUtilization" : 77.4000015258789
    }
}


{
    "_id" : ObjectId("5500b404f37a84d0509527b7"),
    "runtimeMilliSeconds" : NumberLong("1426109402199"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 60.060001373291016,
        "currentMemoryUtilization" : 77.41000366210938
    }
}
{
    "_id" : ObjectId("5500b788f25a6f9765950f65"),
    "runtimeMilliSeconds" : NumberLong("1426110301345"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 58.689998626708984,
        "currentMemoryUtilization" : 77.41000366210938
    }
}
{
    "_id" : ObjectId("5500bb0cf37a84d050952837"),
    "runtimeMilliSeconds" : NumberLong("1426111202063"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 70.69999694824219,
        "currentMemoryUtilization" : 77.41000366210938
    }
}
{
    "_id" : ObjectId("5500be83f25a6f9765950fde"),
    "runtimeMilliSeconds" : NumberLong("1426112101980"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 69.41000366210938,
        "currentMemoryUtilization" : 77.44000244140625
    }
}

{
    "_id" : ObjectId("5500c206f37a84d0509528ac"),
    "runtimeMilliSeconds" : NumberLong("1426113001781"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 70.63999938964844,
        "currentMemoryUtilization" : 77.44000244140625
    }
}
{
    "_id" : ObjectId("5500c58cf37a84d0509528ea"),
    "runtimeMilliSeconds" : NumberLong("1426113901510"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 68.38999938964844,
        "currentMemoryUtilization" : 77.44000244140625
    }
}
{
    "_id" : ObjectId("5500c911f25a6f97659510a0"),
    "runtimeMilliSeconds" : NumberLong("1426114801403"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 77.7300033569336,
        "currentMemoryUtilization" : 77.44999694824219
    }
}
{
    "_id" : ObjectId("5500cca0f37a84d050952968"),
    "runtimeMilliSeconds" : NumberLong("1426115702206"),
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 74.23999786376953,
        "currentMemoryUtilization" : 77.4800033569336
    }
}]

我想按小时间隔对这些数据进行分组。这意味着我希望将每小时的4个文档分组到单个文档中,以便在&cusMemoryStats'密钥将是所有四个的平均值。 runtimeMilliSeconds也是4个文档的平均值。

即。我希望它像第一到第四,第五到第八个doucment。 我想要12个文件中的四个文件,平均密钥。

示例输出为:

[{
    "_id" : ObjectId("5500a5e6f37a84d0509526ba"),
    "runtimeMilliSeconds" : 1426107152000,
    "cpuMemoryStats" : {
        "currentCpuUtilization" : 68.96500206,
        "currentMemoryUtilization" : 77.400001526
    }
}
.
.
..
]

我试过以下:

db.collection.aggregate({"$match": { "hostId" : "1.1.1.1" , "customerId"   : "customerId" ,
"runtimeMilliSeconds" : { "$gte" : 1426104902206}}},

{"$group" : {"_id" : { "$subtract" :[ {"$divide" : ["$runtimeMilliSeconds", 3600 ]},

{ "$mod" : [{"$divide" : ["$runtimeMilliSeconds", 3600 ]},1] } ] },

"memoryUtilization":{"$avg":"$cpuMemoryStats.currentMemoryUtilization"},
  "runtime":{"$avg":"$runtimeMilliSeconds"}}})

如何使用mongo ???

按小时对数据进行分组

2 个答案:

答案 0 :(得分:4)

日期数学似乎是您的存储格式的明显案例:

 db.collection.aggregate([
     { "$match": { 
         "hostId" : "1.1.1.1" , 
         "customerId" : "customerId" ,
         "runtimeMilliSeconds" : { "$gte" : 1426104902206 },
     }},
     { "$group" : {
         "_id" : { 
             "$subtract": [
                  "$runtimemilliSeconds",
                  { "$mod": [
                      "$runtimemilliSeconds",
                      1000 * 60 * 15 // 1000 ms x 60 sec * 15 mins     
                  ]}
             ]
         },
         "memoryUtilization": { "$avg": "$cpuMemoryStats.currentMemoryUtilization" },
         "runtime":{ "$avg": "$runtimeMilliSeconds" }
     }}
])

所以为了记录,除了一般结构之外,你所寻找的是一个正确的"常数"如图所示,900000为:

 1000 milliseconds
 x 60 seconds
 x 15 minutes

为了实际达到一小时的间隔,您只需更改数字

 1000 milliseconds
 x 60 seconds
 x 60 minutes

这是一个小时。所有间隔都是这样完成的。但它是模数而不是分裂。

答案 1 :(得分:0)

我非常接近回答。我纠正了我的逻辑(数学)。这是正确的查询 -

db.collection.aggregate({
    "$match": {
    "hostId": "1.1.1.1",
    "customerId": "customerId",
    "runtimeMilliSeconds": {
        "$gte": 1426104902206
    }
    }
},
{
    "$group": {
    "_id": {
        "$subtract": [
            {
                "$divide": [
                    "$runtimeMilliSeconds",
                    3600*1000
                ]
            },
            {
                "$mod": [
                    {
                        "$divide": [
                            "$runtimeMilliSeconds",
                            3600*1000
                        ]
                    },
                    1
                ]
            }
        ]
    },
    "memoryUtilization": {
        "$avg": "$cpuMemoryStats.currentMemoryUtilization"
    },
    "runtime": {
        "$last": "$runtimeMilliSeconds"
    }
    }
},
{
    $sort: {
    runtime: 1
    }
})

此查询将按小时对所有数据进行分组,如8.00到9.00,9.00到10.00等