基于嵌套文档集的聚合

时间:2015-04-17 00:29:37

标签: mongodb aggregation-framework

我们说接下来有5个文档:

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }

我想操纵该集合,以便通过设置(组合)课程返回一组学生(带有他们的_id),并计算每组学生的数量。

在上面的示例中,我有3组(组合)课程和学生人数如下:

1 - [ "A", "B" ]< - 2名学生参加此组合

2 - [ "A", "B", "C" ]< - 2名学生

3 - [ "A", "B", "D" ]< - 1名学生

我觉得这更像是MapReduce任务,而不是Aggregation ......不确定......

更新1

非常感谢@ExplosionPills

以下聚合命令:

db.students.aggregate([{
    $group: {
        _id: "$courses",
        count: {$sum: 1},
    students: {$push: "$_id"}
    }
}])

给我以下输出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

按照一系列课程分组,计算属于它的学生人数及其_id

更新2

我发现,上面的汇总将组合[ "C", "A", "B" ]视为与[ "A", "B", "C" ]不同。但我需要这两个计数相同。

让我们看一下以下文件:

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }

让我们在输出中看到这一点:

{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

见第1和第3行 - 这不是我想要的。

因此,要将[ "C", "A", "B" ][ "A", "B", "C" ]视为相同的组合,我更改了聚合,如下所示:

db.students.aggregate([
    {$unwind: "$courses" },
    {$sort : {"courses": 1}}, 
    {$group: {_id: "$_id", courses: {$push: "$courses"}}}, 
    {$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
    ])

输出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }

1 个答案:

答案 0 :(得分:1)

这是使用分组的聚合操作。

db.students.aggregate([{
    $group: {
        // Uniquely identify the document.
        // The $ syntax queries on this field
        _id: "$courses",

        // Add 1 for each field found (effectively a counter)
        count: {$sum: 1}
    }
}]);

编辑:

如果课程可以按任何顺序排列,则可以按照编辑过的问题中的建议再次$unwind$sort$group。也可以通过mapReduce执行此操作,但我不确定哪个更快。

db.students.mapReduce(
    function () {
        // Use the sorted courses as the key
        emit(this.courses.sort(), this._id);
    },
    function (key, values) {
        return {"students": values, count: values.length};
    },
    {out: {inline: 1}}
)