Question

我有一个包含> 100,000个包含多个嵌套数组的文档的集合。我需要根据位于最低级别的属性进行查询，并返回数组底部的对象。

文件结构：

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

我一直在使用Aggregation Pipeline来获得预期的结果，但是性能非常糟糕。这是我的疑问：

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },

            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

结果是：

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

如何才能更快地完成此查询？我尝试过使用索引但是根据Mongo文档，会忽略超过初始$ match的索引。

Answer 1

您还可以尝试在$match人之后向您的查询添加$unwind运算符。

...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....

这将降低以下$unwind和$match运营商要查询的记录数量。

由于你有大量的记录，你可以使用{allowDiskUse:true}选项。

启用写入临时文件。设置为true时，聚合阶段可以将数据写入dbPath中的_tmp子目录。目录

所以，你的最终查询是这样的：

db.collection.aggregate([
        {
            $match: {
                "type": "employee"
            }
        },

        {$unwind: "$people"},
        {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
        {$unwind: "$people.items"},
        {$match: {$or: [ //There could be dozens of items included in this $match
                         {"people.items.itemName": "RobsItemOne"},
                         {"people.items.itemName": "RobsItemTwo"}
                       ]
                 }
        },
        {
            $project: {
                _id: 0,// This is because of the $out
                systemID: "$_id",
                type: "$type",
                item: "$people.items.itemName",
                value: "$people.items.value"
            }
        }

    ], {allowDiskUse:true})

Answer 2

我发现在@BatScream的努力之后还有其他可以努力改进的东西。你可以尝试一下。

// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });

var itemCriteria = {
    $in : [ "RobsItemOne", "RobsItemTwo" ]
};

db.collection.aggregate([ {
    $match : {
        "type" : "employee",
        "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
    }
}, {
    $unwind : "$people"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // narrow data range further
    }
}, {
    $unwind : "$people.items"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
    }
}, {
    $project : {
        _id : 0,                                    // This is because of the $out
        systemID : "$_id",
        type : "$type",
        item : "$people.items.itemName",
        value : "$people.items.value"
    }
}, {
    $out: tempCollection                            // optional
} ], {
    allowDiskUse : true
});

MongoDB复杂子文档查询

2 个答案: