Question

我们说接下来有5个文档：

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }

我想操纵该集合，以便通过设置（组合）课程返回一组学生（带有他们的_id），并计算每组学生的数量。

在上面的示例中，我有3组（组合）课程和学生人数如下：

1 - [ "A", "B" ]＆lt; - 2名学生参加此组合

2 - [ "A", "B", "C" ]＆lt; - 2名学生

3 - [ "A", "B", "D" ]＆lt; - 1名学生

我觉得这更像是MapReduce任务，而不是Aggregation ......不确定......

更新1

非常感谢@ExplosionPills

以下聚合命令：

db.students.aggregate([{
    $group: {
        _id: "$courses",
        count: {$sum: 1},
    students: {$push: "$_id"}
    }
}])

给我以下输出：

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

按照一系列课程分组，计算属于它的学生人数及其_id。

更新2

我发现，上面的汇总将组合[ "C", "A", "B" ]视为与[ "A", "B", "C" ]不同。但我需要这两个计数相同。

让我们看一下以下文件：

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }

让我们在输出中看到这一点：

{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

见第1和第3行 - 这不是我想要的。

因此，要将[ "C", "A", "B" ]和[ "A", "B", "C" ]视为相同的组合，我更改了聚合，如下所示：

db.students.aggregate([
    {$unwind: "$courses" },
    {$sort : {"courses": 1}}, 
    {$group: {_id: "$_id", courses: {$push: "$courses"}}}, 
    {$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
    ])

输出：

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }

Answer 1

这是使用分组的聚合操作。

db.students.aggregate([{
    $group: {
        // Uniquely identify the document.
        // The $ syntax queries on this field
        _id: "$courses",

        // Add 1 for each field found (effectively a counter)
        count: {$sum: 1}
    }
}]);

编辑：

如果课程可以按任何顺序排列，则可以按照编辑过的问题中的建议再次$unwind，$sort和$group。也可以通过mapReduce执行此操作，但我不确定哪个更快。

db.students.mapReduce(
    function () {
        // Use the sorted courses as the key
        emit(this.courses.sort(), this._id);
    },
    function (key, values) {
        return {"students": values, count: values.length};
    },
    {out: {inline: 1}}
)

基于嵌套文档集的聚合

1 个答案: