查找重复文档的数量

时间:2018-04-09 18:14:08

标签: mongodb mongodb-query aggregation-framework

我在编写代码时遇到了一个错误,在我的MongoDB上创建了一些重复的用户。

收集示例:

"_id" : ObjectId("5abb9d72b884fb00389efeef"),   
"user" : ObjectId("5abb9d72b884fb00389efee5"),  
"displayName" : "test",                                               
"fullName" : "test test test",                                        
"email" : "test@mail.com",                                            
"phoneNumber" : "99999999999",                                        
"createdAt" : ISODate("2016-05-18T13:49:38.533Z")

我能够使用此查询找到重复的用户:

db.users.aggregate([{$group: {_id: "$user", "Total": {$sum: 1}}}, {
   $match: { "Total": {$gt: 1}}}])

用这一个算一下:

db.users.aggregate([{$group: {_id: "$user", "Total": {$sum: 1}}}, {
   $match: { "Total": {$gt: 1}}}, { $count: "Total"}])

我想知道我需要删除多少用户,但第二个查询只返回受影响的唯一身份用户总数。

如何获得重复用户的总和?或者总和“总计”。

预期结果:

{ "Total" : **** }

2 个答案:

答案 0 :(得分:1)

没有你的数据集,所以没有在我的本地测试。试试这个问题:

db.users.aggregate([
 {$group: {_id: "$user", Total: {$sum: 1}}}, //group by user and count each.
 {$addFields: {Total: {$subtract:["$Total",1]}}}, // you need duplicate count, so forget first instance of it.
 {$group:{_id:null, Total: {$sum:"$Total"}}}, // your _id is unique, perform a sum out of it
 {$project:{_id:0, Total:1}} // at the end the result is total number of 'duplicate' users.
])

答案 1 :(得分:1)

嗯,您可以使用以下管道执行此操作

[
    { $group: {
        _id: null, 
        uniqueValues: { $addToSet: "$user" }, 
        count: { $sum: 1 }
    }}, 
    { $project: { 
        total: { $subtract: [ "$count", { $size: "$uniqueValues" } ] } 
    }} 
]