Question

TL; DR

我们在$project和$match阶段之间添加$lookup阶段，以过滤掉不必要的数据或为字段加别名。这些$project阶段提高了对文件的读取能力在调试时查询，但是当查询中涉及的每个集合中有大量文档时，它们将以任何方式影响性能。

详细问题

例如，我有两个收藏集学校和学生，如下所示：

是的，架构设计很糟糕，我知道！ MongoDB说-将所有内容放在同一集合中以避免关系，但现在就继续使用此方法。

学校收藏

{
    "_id": ObjectId("5c04dca4289c601a393d9db8"),
    "name": "First School Name",
    "address": "1 xyz",
    "status": 1,
    // Many more fields
},
{
    "_id": ObjectId("5c04dca4289c601a393d9db9"),
    "name": "Second School Name",
    "address": "2 xyz",
    "status": 1,
    // Many more fields
},
// Many more Schools

学生集合

{
    "_id": ObjectId("5c04dcd5289c601a393d9dbb"),
    "name": "One Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db8"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
{
    "_id": ObjectId("5c04dcd5289c601a393d9dbc"),
    "name": "Second Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db9"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
// Many more students

现在在如下所示的查询中，我在$project之后$match之后有一个$lookup阶段。那么此$project阶段是否必要？当查询中涉及的所有集合中有大量文档时，此阶段会影响性能吗？

db.students.aggregate([
    {
        $match: {
            "Gender": "Male"
        }
    },
    // 1. Below $project stage is not necessary apart from filtering out and aliasing.
    // 2. Will this stage affect performance when there are huge number of documents?
    {
        $project: {
            "_id": 0,
            "student_id": "$_id",
            "student_name": "$name",
            "school_id": 1
        }
    },
    {
        $lookup: {
            from: "schools",
            let: {
                "school_id": "$school_id"
            },
            pipeline: [
                {
                    $match: {
                        "status": 1,
                        $expr: {
                            $eq: ["$_id", "$$school_id"]
                        }
                    }
                },
                {
                    $project: {
                        "_id": 0,
                        "name": 1
                    }
                }
            ],
            as: "school"
        }
    },
    {
        $unwind: "$school"
    }
]);

Answer 1

请阅读以下内容：https://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/

与您的特定情况有关的是 The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

因此，幕后进行了一些优化。您可以尝试在聚合中使用解释选项，以准确了解mongo在尝试优化管道的方式。

我认为您正在做的事情实际上会在减少流经的数据量的同时提高性能。

MongoDB聚合中的多个$ project阶段是否会影响性能

1 个答案: