Question

如何在mongo集合中找到重复的字段。

我想查看是否有任何＆＃34;名称＆＃34;字段是重复的。

{
    "name" : "ksqn291",
    "__v" : 0,
    "_id" : ObjectId("540f346c3e7fc1054ffa7086"),
    "channel" : "Sales"
}

非常感谢！

Answer 1

在name上使用聚合，并使用name获取count > 1：

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

按大多数重复排序结果：

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$sort": {"count" : -1} },
    {"$project": {"name" : "$_id", "_id" : 0} }     
)

要使用其他列名而不是“名称”，请将“ $ name ”更改为“ $ column_name ”

Answer 2

您可以使用以下list管道找到duplicate个aggregate名称：

Group所有记录都有类似的name。
Match那些groups的记录大于1。
然后group再次project将所有重复的名称改为array。

守则：

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

O / P：

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

Answer 3

如果你有一个大型数据库并且属性名只出现在某些文档中，那么anhic给出的答案可能效率很低。

为了提高效率，您可以在聚合中添加$ match。

db.collection.aggregate(
    {"$match": {"name" :{ "$ne" : null } } }, 
    {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

Answer 4

如果有人正在寻找带有额外“$and” where 子句的重复查询，例如“and where someOtherField is true”

诀窍是从另一个 $match 开始，因为在分组之后，您不再拥有所有可用数据

// Do a first match before the grouping
{ $match: { "someOtherField": true }},
{ $group: {
    _id: { name: "$name" },
    count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},

找了很久才找到这个记法，希望能帮到遇到同样问题的人

Answer 5

如果您需要查看所有重复的行：

^               from the start of the data field
\\d+            an integer
(?:\\.\\w+)?    followed by optional dot and word component
(?:\\(.*?\\))*  followed by zero or more (...) terms
[ ]             a single space
.*              then match the entire description
$               until the end of the data field

Answer 6

db.collectionName.aggregate([
{ $group:{
    _id:{Name:"$name"},
    uniqueId:{$addToSet:"$_id"},
    count:{"$sum":1}
  } 
},
{ $match:{
  duplicate:{"$gt":1}
 }
}
]);

第一组根据字段查询组。

然后我们检查唯一ID并计算它，如果count大于1，那么该字段在整个集合中是重复的，这样就可以通过$ match query来处理。

在MongoDB中查找重复记录

6 个答案: