集合中的每个文档如下所示。在这种情况下,A和C可以,但是B可以重复。
{
"_id": {
"$oid": "5bef93fc1c4b3236e79f9c25" # all these are unique
},
"Created_at": "Sat Nov 17 04:07:12 +0000 2018",
"ID": {
"$numberLong": "1063644700727480320" # duplicates identified by this ID
},
"Category": "A" #this is the category
}
{
"_id": {
"$oid": "5bef93531c4b3236e79f9c11"
},
"Created_at": "Sat Nov 17 05:17:12 +0000 2018",
"ID": {
"$numberLong": "1063644018276360192"
},
"Category": "B"
}
{
"_id": {
"$oid": "5bef94e81c4b3236e79f9c3b"
},
"Created_at": "Sat Nov 17 05:17:12 +0000 2018",
"ID": {
"$numberLong": "1063644018276360192"
},
"Category": "B"
}
{
"_id": {
"$oid": "5bef94591c4b3236e79f9cee"
},
"Created_at": "Sat Nov 17 05:17:12 +0000 2018",
"ID": {
"$numberLong": "1063644700727481111"
},
"Category": "C"
}
重复项由其ID定义。我想计算重复项的数量并像这样打印其类别。
类别A:5(5个重复的标签为类别A)
类别B:6
C类:15
这是我尝试过的方法,但未打印任何内容。我已经为Mongo数据库添加了重复项。
cursor = db.collection.aggregate([
{
"$group": {
"_id": {"ID": "$ID"},
"uniqueIds": { "$addToSet": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$match": { "count": { "$gt": 1 } } }
])
for document in cursor:
print(document)
感谢您的帮助:)
答案 0 :(得分:0)
尝试一下:
db.collection.aggregate([
{
$group : {
"_id" : {"ID" : "$ID", "Category" : "$Category"},
"Count" : {$sum : 1}
}
},
{
$match : {
"Count" : {$gt : 1}
}
},
{
$project : {
"_id" : 0,
"ID" : "$_id.ID",
"Category" : "$_id.Category",
"Count" : "$Count"
}
}
]);
希望这会有所帮助!