我有一个包含> 100,000个包含多个嵌套数组的文档的集合。我需要根据位于最低级别的属性进行查询,并返回数组底部的对象。
文件结构:
{
_id: 12345,
type: "employee",
people: [
{
name: "Rob",
items: [
{
itemName: "RobsItemOne",
value: "$10.00",
description: "some description about the item"
},
{
itemName: "RobsItemTwo",
value: "$15.00",
description: "some description about the item"
}
]
}
]
}
我一直在使用Aggregation Pipeline来获得预期的结果,但是性能非常糟糕。这是我的疑问:
db.collection.aggregate([
{
$match: {
"type": "employee"
}
},
{$unwind: "$people"},
{$unwind: "$people.items"},
{$match: {$or: [ //There could be dozens of items included in this $match
{"people.items.itemName": "RobsItemOne"},
{"people.items.itemName": "RobsItemTwo"}
]
}
},
{
$project: {
_id: 0,// This is because of the $out
systemID: "$_id",
type: "$type",
item: "$people.items.itemName",
value: "$people.items.value"
}
},
{$out: tempCollection} //Would like to avoid this, but was exceeding max document size
])
结果是:
[
{
"type" : "employee",
"systemID" : 12345,
"item" : "RobsItemOne",
"value" : "$10.00"
},
{
"type" : "employee",
"systemID" : 12345,
"item" : "RobsItemTwo",
"value" : "$10.00"
}
]
如何才能更快地完成此查询?我尝试过使用索引但是根据Mongo文档,会忽略超过初始$ match的索引。
答案 0 :(得分:0)
您还可以尝试在$match
人之后向您的查询添加$unwind
运算符。
...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....
这将降低以下$unwind
和$match
运营商要查询的记录数量。
由于你有大量的记录,你可以使用{allowDiskUse:true}
选项。
启用写入临时文件。设置为true时,聚合 阶段可以将数据写入dbPath中的_tmp子目录 。目录
所以,你的最终查询是这样的:
db.collection.aggregate([
{
$match: {
"type": "employee"
}
},
{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"},
{$match: {$or: [ //There could be dozens of items included in this $match
{"people.items.itemName": "RobsItemOne"},
{"people.items.itemName": "RobsItemTwo"}
]
}
},
{
$project: {
_id: 0,// This is because of the $out
systemID: "$_id",
type: "$type",
item: "$people.items.itemName",
value: "$people.items.value"
}
}
], {allowDiskUse:true})
答案 1 :(得分:0)
我发现在@BatScream的努力之后还有其他可以努力改进的东西。你可以尝试一下。
// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });
var itemCriteria = {
$in : [ "RobsItemOne", "RobsItemTwo" ]
};
db.collection.aggregate([ {
$match : {
"type" : "employee",
"people.items.itemName" : itemCriteria // add this criteria to narrow source range further
}
}, {
$unwind : "$people"
}, {
$match : {
"people.items.itemName" : itemCriteria // narrow data range further
}
}, {
$unwind : "$people.items"
}, {
$match : {
"people.items.itemName" : itemCriteria // final match, avoid to use $or operator
}
}, {
$project : {
_id : 0, // This is because of the $out
systemID : "$_id",
type : "$type",
item : "$people.items.itemName",
value : "$people.items.value"
}
}, {
$out: tempCollection // optional
} ], {
allowDiskUse : true
});