输入数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList" : [
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : false
}
],
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
输出数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList":
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
查询1(先按unwind
然后match
)
aggregate([
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
查询2(先按match
然后按unwind
然后按match
)
aggregate([
{
$match: { 'idList.isDispacthed': isDispatched }
},
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
我的问题/我的担忧
(假设我在此集合中有大量文档(50k +),并假设在此查询后在同一管道中还有其他查找和投影)
match -> unwind -> match
与VS unwind ->match
答案 0 :(得分:1)
这完全取决于MongoDB查询计划程序优化器:
聚合流水线操作具有一个优化阶段,该阶段试图重塑流水线以提高性能。
要查看优化程序如何转换特定的聚合管道,请在db.collection.aggregate()方法中包括explain选项。
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
为poDetailsId
创建索引并运行以下查询:
db.getCollection('collection').explain().aggregate([
{
$unwind: "$idList"
},
{
$match: {
'idList.isDispacthed': true,
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f")
}
}
])
{
"stages" : [
{
"$cursor" : {
"query" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryHash" : "2CF7E390",
"planCacheKey" : "A8739F51",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"poDetailsId" : 1.0
},
"indexName" : "poDetailsId_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"poDetailsId" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"poDetailsId" : [
"[ObjectId('5dc7ac15720a2772c7b7666f'), ObjectId('5dc7ac15720a2772c7b7666f')]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$unwind" : {
"path" : "$idList"
}
},
{
"$match" : {
"idList.isDispacthed" : {
"$eq" : true
}
}
}
],
"ok" : 1.0
}
如您所见,MongoDB会将聚合更改为:
db.getCollection('collection').aggregate([
{
$match: { "poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f") }
}
{
$unwind: "$idList"
},
{
$match: { 'idList.isDispacthed': true }
}
])
从逻辑上讲,$match -> $unwind -> $match
更好,因为您(按索引)过滤了记录的子集,而不是完全扫描(处理100个匹配的文档≠所有文档)。
如果聚合操作仅需要集合中数据的一部分,请使用
$match
,$limit
和$skip
阶段来限制在文档开头输入的文档。管道。当放置在管道的开头时,$match
操作将使用合适的索引仅扫描集合中匹配的文档。
https://docs.mongodb.com/manual/core/aggregation-pipeline/#early-filtering