Question

我有一个类似下面的架构..

{
     "_id" : ObjectId("5a4d28ae4f46990ebfd773dc"),
        "student_id" : 0,
        "scores" : [
                {
                        "type" : "exam",
                        "score" : -146.09163691278604
                },
                {
                        "type" : "quiz",
                        "score" : 99.41521018793881
                },
                {
                        "type" : "homework",
                        "score" : 0.002307340920915113
                },
                {
                        "type" : "homework",
                        "score" : 73.32279648314594
                }
        ],
        "class_id" : 143
}

现在我需要找到class_id类型＆＃34;考试＆＃34;的最高平均分数。并且该班级的所有学生都得到的分数高于该特定班级的计算平均分。我写了类似下面的内容

db.students.aggregate([{
            $unwind: '$scores'
        }, {
            $match: {
                'scores.type': 'exam'
            }
        }, {
            $group: {
                _id: '$class_id',
                'average': {
                    $avg: '$scores.score'
                },
                'stud': {
                    $push: {
                        'student_id': '$student_id',
                        'marks': '$scores.score'
                    }
                }
            }
        }, {
            $sort: {
                'average': -1
            }
        }, {
            $limit: 1
        }, {
            $project: {
                'Average Marks': '$average',
                'students_higher': {
                    $filter: {
                        input: '$stud',
                        as: 'st',
                        cond: {
                            $gt: ['$$st.marks', '$average']
                        }
                    }
                }
            }
        }, {
            $unwind: '$students_higher'
        }, {
            $sort: {
                'students_higher.marks': -1
            }
        }
    ]).pretty()

然而，查询通常平均花费大约900 + ms~1秒，我已粘贴下面最近的执行（需要700ms，这是在我运行此查询15次后），现在它的性能令人难以忘怀我。我知道$ match应该是第一阶段，作为利用索引的最佳实践的一部分，但在这里我无法想出要优化它，也许一些建议可以帮助我。

planSummary: COLLSCAN keysExamined: 0 docsExamined: 99998 hasSortStage: 1 cursorExhausted: 1 numYields: 785 nreturned: 97 reslen: 9972 locks: {
    Global: {
        acquireCount: {
            r: 1632
        }
    },
    Database: {
        acquireCount: {
            r: 816
        }
    },
    Collection: {
        acquireCount: {
            r: 816
        }
    }
}
protocol: op_msg 788ms

Answer 1

尝试在管道中添加初步的$ match阶段，以便过滤掉从未参加过考试的学生。这将减少需要解除绑定的文档数量。

作为一个额外的好处，这将允许聚合在创建后使用索引{ "scores.type" : 1 }。

Answer 2

尝试以下选项，取出$ unwind和$ match，而不是$project使用$filter。

db.so.aggregate([ 
{$project:{
  _id:1,
  student_id:1,
  class_id:1,
  scores:{
    $filter:{
      input:"$scores",
      as:"scores",
      cond:{$eq:["$$scores.type", "exam"]} 
    }
  }
 }}, 
 { $group: {
     _id: '$class_id',
     'average': {
          $avg: '$scores.score'
     },
     'stud': { 
          $push: {
             'student_id': '$student_id', 
             'marks': '$scores.score' 
          }
      }
 }},
{ $sort: { 'average': -1 }},
{ $limit: 1 },
{ $project: { 
     'Average Marks': '$average', 
     'students_higher': { 
          $filter: { 
              input: '$stud', 
              as: 'st',   
              cond: { $gt: ['$$st.marks', '$average']} 
          }   
      }
 }},
 { $unwind: '$students_higher' },
 { $sort: { 'students_higher.marks': -1 }}
]).pretty()

请注意文件

scores:[
  {
    "type" : "exam",
    "score" : -146.09163691278604
  },
  ....
]

始终将type:exam文档放在scores数组的第一个位置，然后使用$ slice的$ project会更快

如何在第一阶段使用$ unwind优化查询？

2 个答案: