Mongodb按天聚合并删除重复值

时间:2018-04-05 13:10:14

标签: mongodb mongoose

我试图清理一个庞大的数据库。

示例DB:

{
  "_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
  "addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
  "__v" : 0,
  "check" : 17602,
  "lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
  "tracking" : [
  {
      "timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
      "_id" : ObjectId("5a0060e00f3c330012bafe39"),
      "rank" : 2395,
  }, 
  {
      "timeCheck" : ISODate("2017-11-06T13:22:31.254Z"),
      "_id" : ObjectId("5a0062170f3c330012bafe77"),
      "rank" : 2395,
  }, 
  {
      "timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
      "_id" : ObjectId("5a00634c0f3c330012bafebe"),
      "rank" : 2379,
  }, 
  {
      "timeCheck" : ISODate("2017-11-06T13:32:41.084Z"),
      "_id" : ObjectId("5a0064790f3c330012baff03"),
      "rank" : 2395,
  }, 
  {
      "timeCheck" : ISODate("2017-11-06T13:37:51.012Z"),
      "_id" : ObjectId("5a0065af0f3c330012baff32"),
      "rank" : 2379,
  }, 
  {
  "timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
  "_id" : ObjectId("5a0065af0f3c330012baff34"),
  "rank" : 2379,
  }]
}

我有很多重复的价值,但我只需要白天清理。 为了获得这个例子:

{
  "_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
  "addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
  "__v" : 0,
  "check" : 17602,
  "lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
  "tracking" : [
  {
      "timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
      "_id" : ObjectId("5a0060e00f3c330012bafe39"),
      "rank" : 2395,
  }, 
  {
      "timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
      "_id" : ObjectId("5a00634c0f3c330012bafebe"),
      "rank" : 2379,
  }, 
  {
  "timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
  "_id" : ObjectId("5a0065af0f3c330012baff34"),
  "rank" : 2379,
  }]
}

如何按天和之后汇总最后一个值重复? 我需要保持每天的价值,即使它们与另一天相同。

1 个答案:

答案 0 :(得分:0)

聚合框架无法在此阶段更新数据。但是,您可以使用以下聚合管道来获得所需的输出,然后使用例如bulk replace更新所有文件:

db.collection.aggregate({
    $unwind: "$tracking" // flatten the "tracking" array into separate documents
}, {
    $sort: {
        "tracking.timeCheck": 1 // sort by timeCheck to allow us to use the $first operator in the next stage reliably
    }
}, {
    $group: {
        _id: {  // group by
            "_id": "$_id", // "_id" and 
            "rank": "$tracking.rank", // "rank" and
            "date": { // the "date" part of the "timeCheck" field
                $dateFromParts : {
                    year: { $year: "$tracking.timeCheck" },
                    month:  { $month: "$tracking.timeCheck" },
                    day: { $dayOfWeek: "$tracking.timeCheck" }
                }
            }
        },
        "doc": { $first: "$$ROOT" } // only keep the first document per group
    }
}, {
    $sort: {
        "doc.tracking.timeCheck": 1 // restore ascending sort order - may or may not be needed...
    }
}, {
    $group: {
        _id: "$_id._id", // merge everything again per "_id"
        "addedAt": { $first: "$doc.addedAt" },
        "__v": { $first: "$doc.__v" },
        "check": { $first: "$doc.check" },
        "lastCheck": { $first: "$doc.lastCheck" },
        "tracking": { $push: "$doc.tracking" } // in order to join the tracking values into an array again
    }
})