我有一个包含~2.5万个文档的集合,集合大小为14,1GB
,存储大小为4.2GB
,平均对象大小为5,8KB
。我在两个字段dataSourceName
和version
(文本字段)上创建了两个单独的索引,并尝试制作一个汇总查询,以列出他们按'分组的字段。值。
(试图实现这一点:select dsn, v from collection group by dsn, v
)。
db.getCollection("the-collection").aggregate(
[
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
{
"allowDiskUse" : false
}
);
即使MongoDB在服务器上占用~10GB的RAM,这些字段也被编入索引,而其他任何东西都没有运行,聚合需要大约40秒。
我尝试创建一个新索引,它按顺序包含两个字段,但是,查询似乎还没有使用索引:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "db.the-collection",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [
]
}
}
},
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
"ok" : 1.0
}
我在Windows上使用MongoDB 3.6.5 64bit,因此它应该使用索引:https://docs.mongodb.com/master/core/aggregation-pipeline/#pipeline-operators-and-indexes
<击> 正如@ Alex-Blex建议的那样,我尝试了排序,但是我得到了OOM错误:
The following error occurred while attempting to execute the aggregate query
Mongo Server error (MongoCommandException): Command failed with error 16819: 'Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server server-address:port.
The full response is:
{
"ok" : 0.0,
"errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
"code" : NumberInt(16819),
"codeName" : "Location16819"
}
击> <击> 撞击>
我的不好,我在错误的集合上尝试了...添加与索引相同的排序,现在它正在使用索引。仍然没有快速思考,花了大约10秒才给我结果。
新的exaplain:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"sort" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1)
},
"fields" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"queryPlanner" : {
"plannerVersion" : NumberInt(1),
"namespace" : "....",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"dataSourceName" : NumberInt(1),
"version" : NumberInt(1)
},
"indexName" : "dataSourceName_1_version_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"dataSourceName" : [
],
"version" : [
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"dataSourceName" : [
"[MinKey, MaxKey]"
],
"version" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
]
}
}
},
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
"ok" : 1.0
}
答案 0 :(得分:2)
您所指的页面恰恰相反:
$ match和$ sort管道运算符可以利用索引
您的第一阶段是$group
,既不是$match
也不是$sort
。
尝试在第一阶段对其进行排序以触发索引的使用:
db.getCollection("the-collection").aggregate(
[
{ $sort: { dataSourceName:1, version:1 } },
{
"$group" : {
"_id" : {
"dataSourceName" : "$dataSourceName",
"version" : "$version"
}
}
}
],
{
"allowDiskUse" : false
}
);
请注意,它应该是具有相同字段和排序的单个复合索引:
db.getCollection("the-collection").createIndex({ dataSourceName:1, version:1 })