时间序列和聚合框架(mongodb)

时间:2014-07-18 11:17:37

标签: javascript node.js mongodb aggregation-framework

我试图同步我在应用中运行的两个功能。 第一个检查我每次阻止(例如每10秒)实时保存到MongoDB的文件的数量:

var getVolume = function(timeBlock, cb) {
    var triggerTime = Date.now();
    var blockPeriod = triggerTime - timeBlock;

    Document.find({
        time: { $gt: blockPeriod }
    }).count(function(err, count) {
        log('getting volume since ', new Date(blockPeriod), 'result is', count)
        cb(triggerTime, count);
    });
};

然后我有第二个函数,每当我想获取图形数据(前端)时我都会使用它:

var getHistory = function(timeBlock, end, cb) {

    Document.aggregate(
    {
        $match: {
            time: {
                $gte: new Date(end - 10 * timeBlock),
                $lt: new Date(end)
            }
        }
    },

    // count number of documents based on time block
    // timeBlock is divided by 1000 as we use it as seconds here
    // and the timeBlock parameter is in miliseconds
    {
        $group: {
            _id: {
                year: { $year: "$time" },
                month: { $month: "$time" },
                day: { $dayOfMonth: "$time" },
                hour: { $hour: "$time" },
                minute: { $minute: "$time" },
                second: { $subtract: [
                    { $second: "$time" },
                    { $mod: [
                        { $second: "$time" },
                        timeBlock / 1000
                    ]}
                ]}
            },
            count: { $sum: 1 }
        }
    },

    // changing the name _id to timeParts
    {
        $project: {
            timeParts: "$_id",
            count: 1,
            _id: 0
        }
    },

    // sorting by date, from earliest to latest
    {
        $sort: {
            "time": 1
        }
    }, function(err, result) {
        if (err) {
            cb(err)
        } else {
            log("start", new Date(end - 10 * timeBlock))
            log("end", new Date(end))
            log("timeBlock", timeBlock)
            log(">****", result)
            cb(result)
        }
    })
}

问题是我无法在我的图表和后端代码(getVolume函数)上获得相同的值

我意识到来自getHistory的日志不是我预期的日志(记录如下):

start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)
end Fri Jul 18 2014 11:58:36 GMT+0100 (BST)
timeBlock 10000
>**** [ { count: 4,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 30 } },
  { count: 6,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 20 } },
  { count: 3,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 10 } },
  { count: 3,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 0 } },
  { count: 2,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 57, second: 50 } } ]

所以我希望getHistory应该从start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)开始每隔10秒钟在mongo中查找数据,所以它看起来大致如下:

11:56:56 count: 3
11:57:06 count: 0
11:57:16 count: 14
... etc.

TODO: 1.我知道我应该在我的聚合函数中覆盖计数为0的情况,此时我猜这个时间被跳过了。

1 个答案:

答案 0 :(得分:2)

您的错误是您为_id运营商计算$group的方式,特别是其second部分:

second: { $subtract: [
    { $second: "$time" },
    { $mod: [
        { $second: "$time" },
        timeBlock / 1000
    ]}
]}

因此,您不是将所有数据拆分为从timeBlock开始的10 new Date(end - 10 * timeBlock)毫秒长块,而是将其从最近的除数{{1}开始分割为11个块。 }。

要解决此问题,您应首先计算timeBlock,然后使用它而不是原始delta = end - $time来构建$time

这是我的意思的一个例子:

_id

我还建议你使用原始时间值(以毫秒为单位),因为它更容易,因为它可以防止你犯错误。您可以使用Document.aggregate({ $match: { time: { $gte: new Date(end - 10 * timeBlock), $lt: new Date(end) } } }, { $project: { time: 1, delta: { $subtract: [ new Date(end), "$time" ]} } }, { $project: { time: 1, delta: { $subtract: [ "$delta", { $mod: [ "$delta", timeBlock ]} ]} } }, { $group: { _id: { $subtract: [ new Date(end), "$delta" ]}, count: { $sum: 1 } } }, { $project: { time: "$_id", count: 1, _id: 0 } }, { $sort: { time: 1 } }, function(err, result) { // ... }) 运算符在time之后将timeParts投射到$group