我们说接下来有5个文档:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
我想操纵该集合,以便通过设置(组合)课程返回一组学生(带有他们的_id),并计算每组学生的数量。
在上面的示例中,我有3组(组合)课程和学生人数如下:
1 - [ "A", "B" ]
< - 2名学生参加此组合
2 - [ "A", "B", "C" ]
< - 2名学生
3 - [ "A", "B", "D" ]
< - 1名学生
我觉得这更像是MapReduce
任务,而不是Aggregation
......不确定......
更新1
非常感谢@ExplosionPills
以下聚合命令:
db.students.aggregate([{
$group: {
_id: "$courses",
count: {$sum: 1},
students: {$push: "$_id"}
}
}])
给我以下输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
按照一系列课程分组,计算属于它的学生人数及其_id
。
更新2
我发现,上面的汇总将组合[ "C", "A", "B" ]
视为与[ "A", "B", "C" ]
不同。但我需要这两个计数相同。
让我们看一下以下文件:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }
让我们在输出中看到这一点:
{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
见第1和第3行 - 这不是我想要的。
因此,要将[ "C", "A", "B" ]
和[ "A", "B", "C" ]
视为相同的组合,我更改了聚合,如下所示:
db.students.aggregate([
{$unwind: "$courses" },
{$sort : {"courses": 1}},
{$group: {_id: "$_id", courses: {$push: "$courses"}}},
{$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
])
输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }
答案 0 :(得分:1)
这是使用分组的聚合操作。
db.students.aggregate([{
$group: {
// Uniquely identify the document.
// The $ syntax queries on this field
_id: "$courses",
// Add 1 for each field found (effectively a counter)
count: {$sum: 1}
}
}]);
编辑:
如果课程可以按任何顺序排列,则可以按照编辑过的问题中的建议再次$unwind
,$sort
和$group
。也可以通过mapReduce
执行此操作,但我不确定哪个更快。
db.students.mapReduce(
function () {
// Use the sorted courses as the key
emit(this.courses.sort(), this._id);
},
function (key, values) {
return {"students": values, count: values.length};
},
{out: {inline: 1}}
)