我正在尝试查询名为'tasks'的大型数据集中的结果,其中包含 187297文档,这些数据集嵌套在另一个名为'workers'的数据集中,它又嵌套在一个名为'production_units'的集合中。
production_units - >工人 - >任务
(BTW这是production_units的简化版本):
[{
"_id": ObjectId("5aca27b926974863ed9f01ab"),
"name": "Z",
"workers": [{
"name": "X Y",
"worker_number": 655,
"employed": false,
"_id": ObjectId("5aca27bd26974863ed9f0425"),
"tasks": [{
"_id": ObjectId("5ac9f6c2e1a668d6d39c1fd1"),
"inbound_order_number": 3296,
"task_number": 90,
"minutes_elapsed": 120,
"date": "2004-11-30",
"start": 1101823200,
"pieces_actual": 160,
"pause_from": 1101812400,
"pause_to": 1101814200
}]
}]
}]
为了实现这个目的,我使用了以下聚合命令:
db.production_units.aggregate([{
'$project': {
'workers': '$workers'
}
}, {
'$unwind': '$workers'
}, {
'$project': {
'tasks': '$workers.tasks',
'worker_number': '$workers.worker_number'
}
}, {
'$unwind': '$tasks'
}, {
'$project': {
'task_number': '$tasks.task_number',
'pieces_actual': '$tasks.pieces_actual',
'minutes_elapsed': '$tasks.minutes_elapsed',
'worker_number': 1,
'start': '$tasks.start',
'inbound_order_number': '$tasks.inbound_order_number',
'pause_from': '$tasks.pause_from',
'date': '$tasks.date',
'_id': '$tasks._id',
'pause_to': '$tasks.pause_to'
}
}, {
'$match': {
'start': {
'$exists': true
}
}
}, {
'$group': {
'entries_count': {
'$sum': 1
},
'_id': null,
'entries': {
'$push': '$$ROOT'
}
}
}, {
'$project': {
'entries_count': 1,
'_id': 0,
'entries': 1
}
}, {
'$unwind': '$entries'
}, {
'$project': {
'task_number': '$entries.task_number',
'pieces_actual': '$entries.pieces_actual',
'minutes_elapsed': '$entries.minutes_elapsed',
'worker_number': '$entries.worker_number',
'start': '$entries.start',
'inbound_order_number': '$entries.inbound_order_number',
'pause_from': '$entries.pause_from',
'date': '$entries.date',
'entries_count': 1,
'_id': '$entries._id',
'pause_to': '$entries.pause_to'
}
}, {
'$sort': {
'start': 1
}
}, {
'$skip': 187290
}, {
'$limit': 10
}], {
allowDiskUse: true
})
返回的文件是:
{ "entries_count" : 187297, "task_number" : 100, "pieces_actual" : 68, "minutes_elapsed" : 102, "worker_number" : 411, "start" : 1594118400, "inbound_order_number" : 8569, "pause_from" : 1594119600, "date" : "2020-07-07", "_id" : ObjectId("5ac9f6d3e1a668d6d3a06351"), "pause_to" : 1594119600 } { "entries_count" : 187297, "task_number" : 130, "pieces_actual" : 20, "minutes_elapsed" : 30, "worker_number" : 549, "start" : 1596531600, "inbound_order_number" : 7683, "pause_from" : 1596538800, "date" : "2020-08-04", "_id" : ObjectId("5ac9f6cde1a668d6d39f1b26"), "pause_to" : 1596538800 } { "entries_count" : 187297, "task_number" : 210, "pieces_actual" : 84, "minutes_elapsed" : 180, "worker_number" : 734, "start" : 1601276400, "inbound_order_number" : 8330, "pause_from" : 1601290800, "date" : "2020-09-28", "_id" : ObjectId("5ac9f6d0e1a668d6d39fd677"), "pause_to" : 1601290800 } { "entries_count" : 187297, "task_number" : 20, "pieces_actual" : 64, "minutes_elapsed" : 90, "worker_number" : 114, "start" : 1601800200, "inbound_order_number" : 7690, "pause_from" : 1601809200, "date" : "2020-10-04", "_id" : ObjectId("5ac9f6cee1a668d6d39f3032"), "pause_to" : 1601811900 } { "entries_count" : 187297, "task_number" : 140, "pieces_actual" : 70, "minutes_elapsed" : 84, "worker_number" : 49, "start" : 1603721640, "inbound_order_number" : 4592, "pause_from" : 1603710000, "date" : "2020-10-26", "_id" : ObjectId("5ac9f6c8e1a668d6d39df664"), "pause_to" : 1603712700 } { "entries_count" : 187297, "task_number" : 80, "pieces_actual" : 20, "minutes_elapsed" : 30, "worker_number" : 277, "start" : 1796628600, "inbound_order_number" : 4655, "pause_from" : 1796641200, "date" : "2026-12-07", "_id" : ObjectId("5ac9f6c8e1a668d6d39e1fc0"), "pause_to" : 1796643900 } { "entries_count" : 187297, "task_number" : 40, "pieces_actual" : 79, "minutes_elapsed" : 123, "worker_number" : 96, "start" : 3802247580, "inbound_order_number" : 4592, "pause_from" : 3802244400, "date" : "2090-06-27", "_id" : ObjectId("5ac9f6c8e1a668d6d39de218"), "pause_to" : 3802244400 }
但是,查询需要几秒钟才能显示结果,而不是几毫秒。这是分析器返回的结果:
db.system.profile.findOne().millis 3216
(UPDATE)
即使是以下简化计数查询也会在312毫秒而不是几个时间内执行:
db.production_units.aggregate([{
"$unwind": "$workers"
}, {
"$unwind": "$workers.tasks"
},
{
"$count": "entries_count"
}
])
这是explain()
为上述查询返回的内容:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"workers" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "my_db.production_units",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 28,
"executionTimeMillis" : 13,
"totalKeysExamined" : 0,
"totalDocsExamined" : 28,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 28,
"executionTimeMillisEstimate" : 0,
"works" : 30,
"advanced" : 28,
"needTime" : 1,
"needYield" : 0,
"saveState" : 1,
"restoreState" : 1,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 28
},
"allPlansExecution" : [ ]
}
}
},
{
"$unwind" : {
"path" : "$workers"
}
},
{
"$unwind" : {
"path" : "$workers.tasks"
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"entries_count" : {
"$sum" : {
"$const" : 1
}
}
}
},
{
"$project" : {
"_id" : false,
"entries_count" : true
}
}
],
"ok" : 1
}
我不是一位经验丰富的DBA,所以我不知道我在聚合管道中究竟缺少什么,以解决我面临的性能问题。我也调查了这个问题并进行了研究,但没有找到任何解决方案。
我缺少什么?
答案 0 :(得分:2)
没有查询的explain()
,就无法确定查询的瓶颈是什么。但是,这里有一些关于如何改进此查询的建议
$project
阶段该查询包含5个 $project
阶段,实际上只需要一个。这可能会增加很多开销,特别是如果应用于大量文档。
而是使用点表示法来查询嵌套字段,例如:
{ "$unwind": "$workers.tasks" }
$match
$match
允许删除部分文档,因此请尽早添加,以便在较少数量的文档上应用进一步的聚合阶段
skip
$limit
和$project
由于查询只返回10个文档,因此无需在180000个其他文档中应用 $project
阶段
这可能是瓶颈。确保将字段workers.tasks.start
编入索引(有关详细信息,请参阅MongoDB ensureIndex())
而不是 $group
/ $unwind
阶段来计算匹配的文档,在同一时间运行另一个查询以仅计算数量匹配文件
主查询现在看起来像:
db.collection.aggregate([{
"$unwind": "$workers"
}, {
"$unwind": "$workers.tasks"
}, {
"$match": {
"workers.tasks.start": {
"$ne": null
}
}
},
{
"$sort": {
"workers.tasks.start": 1
}
}, {
"$skip": 0
}, {
"$limit": 10
},
{
"$project": {
"task_number": "$workers.tasks.task_number",
"pieces_actual": "$workers.tasks.pieces_actual",
"minutes_elapsed": "$workers.tasks.minutes_elapsed",
"worker_number": "$workers.worker_number",
"start": "$workers.tasks.start",
"inbound_order_number": "$workers.tasks.inbound_order_number",
"pause_from": "$workers.tasks.pause_from",
"date": "$workers.tasks.date",
"_id": "$workers.tasks._id",
"pause_to": "$workers.tasks.pause_to"
}
}
])
你可以在这里试试:mongoplayground.net/p/yua7qspo2Jj
计数查询将是
db.collection.aggregate([{
"$unwind": "$workers"
}, {
"$unwind": "$workers.tasks"
}, {
"$match": {
"workers.tasks.start": {
"$ne": null
}
}
},
{
"$count": "entries_count"
}
])
计数查询看起来像