查询mongodb是否有重复但允许基于时间戳的某些重复项

时间:2014-07-28 20:41:58

标签: node.js mongodb express mongoose

所以我有一组数据,其中包含与之关联的时间戳。我想让mongo在3分钟的时间戳内聚合那些有重复的东西。我会告诉你一个我的意思的例子:

原始数据:

[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]

查询后,它将是:

[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
 {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]

因为第二个条目位于第一个条目创建的3分钟气泡内。我已经获得了代码,以便聚合并删除具有相同水果的dupes但现在我只想组合时间戳泡沫中的那些。

1 个答案:

答案 0 :(得分:1)

我们应该能够做到这一点!首先让我们在3分钟的“泡沫”中分成一小时:

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57]

现在要对这些文档进行分组,我们需要稍微修改时间戳。据我所知,聚合框架目前无法实现,因此我将使用group()方法。

为了在同一时间段内对水果进行分组,我们需要将时间戳设置为最近的分钟“气泡”。我们可以使用timestamp.minutes -= (timestamp.minutes % 3)执行此操作。

以下是生成的查询:

db.collection.group({
    keyf: function (doc) {
        var timestamp = new ISODate(doc.timestamp);

        // seconds must be equal across a 'bubble'
        timestamp.setUTCSeconds(0);

        // round down to the nearest 3 minute 'bubble'
        var remainder = timestamp.getUTCMinutes() % 3;
        var bubbleMinute = timestamp.getUTCMinutes() - remainder;
        timestamp.setUTCMinutes(bubbleMinute);

        return { fruit: doc.fruit, 'timestamp': timestamp };
    },
    reduce: function (curr, result) {
        result.sum += 1;
    },
    initial: {
        sum : 0
    }
});

示例结果:

[
    {
        "fruit" : "apple",
        "timestamp" : ISODate("2014-07-17T06:45:00Z"),
        "sum" : 2
    },
    {
        "fruit" : "apple",
        "timestamp" : ISODate("2014-07-17T06:54:00Z"),
        "sum" : 1
    },
    {
        "fruit" : "banana",
        "timestamp" : ISODate("2014-07-17T09:03:00Z"),
        "sum" : 1
    },
    {
        "fruit" : "orange",
        "timestamp" : ISODate("2014-07-17T14:24:00Z"),
        "sum" : 2
    }
]

为了简化这一过程,您可以预先计算“气泡”时间戳,并将其作为单独的字段插入到文档中。您创建的文档如下所示:

[
    {"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z", "bubble": "2014-07-17T06:45:00Z"},
    {"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z", "bubble": "2014-07-17T06:45:00Z"},
    {"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z", "bubble": "2014-07-17T06:54:00Z"}
]

当然这会占用更多存储空间。但是,使用此文档结构,您可以使用聚合函数[0]。

db.collection.aggregate(
  [
    { $group: { _id: { fruit: "$fruit", bubble: "$bubble"} , sum: { $sum: 1 } } },
  ]
)

希望有所帮助!

[0] MongoDB aggregation comparison: group(), $group and MapReduce