索引字段上的MongoDB聚合速度很慢

时间:2018-06-15 11:38:11

标签: mongodb aggregate-functions

我有一个包含~2.5万个文档的集合,集合大小为14,1GB,存储大小为4.2GB,平均对象大小为5,8KB。我在两个字段dataSourceNameversion(文本字段)上创建了两个单独的索引,并尝试制作一个汇总查询,以列出他们按'分组的字段。值。 (试图实现这一点:select dsn, v from collection group by dsn, v)。

db.getCollection("the-collection").aggregate(
    [
        { 
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

即使MongoDB在服务器上占用~10GB的RAM,这些字段也被编入索引,而其他任何东西都没有运行,聚合需要大约40秒。

我尝试创建一个新索引,它按顺序包含两个字段,但是,查询似乎还没有使用索引:

{ 
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                }, 
                "fields" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1), 
                    "_id" : NumberInt(0)
                }, 
                "queryPlanner" : {
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "db.the-collection", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "winningPlan" : {
                        "stage" : "COLLSCAN", 
                        "direction" : "forward"
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    "ok" : 1.0
}

我在Windows上使用MongoDB 3.6.5 64bit,因此它应该使用索引:https://docs.mongodb.com/master/core/aggregation-pipeline/#pipeline-operators-and-indexes

<击> 正如@ Alex-Blex建议的那样,我尝试了排序,但是我得到了OOM错误:

The following error occurred while attempting to execute the aggregate query

Mongo Server error (MongoCommandException): Command failed with error 16819: 'Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server server-address:port. 

The full response is:
{ 

    "ok" : 0.0, 

    "errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.", 

    "code" : NumberInt(16819), 

    "codeName" : "Location16819"

}

<击>

我的不好,我在错误的集合上尝试了...添加与索引相同的排序,现在它正在使用索引。仍然没有快速思考,花了大约10秒才给我结果。

新的exaplain:

{ 
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                }, 
                "sort" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1)
                }, 
                "fields" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1), 
                    "_id" : NumberInt(0)
                }, 
                "queryPlanner" : {
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "....", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "winningPlan" : {
                        "stage" : "PROJECTION", 
                        "transformBy" : {
                            "dataSourceName" : NumberInt(1), 
                            "version" : NumberInt(1), 
                            "_id" : NumberInt(0)
                        }, 
                        "inputStage" : {
                            "stage" : "IXSCAN", 
                            "keyPattern" : {
                                "dataSourceName" : NumberInt(1), 
                                "version" : NumberInt(1)
                            }, 
                            "indexName" : "dataSourceName_1_version_1", 
                            "isMultiKey" : false, 
                            "multiKeyPaths" : {
                                "dataSourceName" : [

                                ], 
                                "version" : [

                                ]
                            }, 
                            "isUnique" : false, 
                            "isSparse" : false, 
                            "isPartial" : false, 
                            "indexVersion" : NumberInt(2), 
                            "direction" : "forward", 
                            "indexBounds" : {
                                "dataSourceName" : [
                                    "[MinKey, MaxKey]"
                                ], 
                                "version" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    "ok" : 1.0
}

1 个答案:

答案 0 :(得分:2)

您所指的页面恰恰相反:

  

$ match和$ sort管道运算符可以利用索引

您的第一阶段是$group,既不是$match也不是$sort

尝试在第一阶段对其进行排序以触发索引的使用:

db.getCollection("the-collection").aggregate(
    [
        { $sort: { dataSourceName:1, version:1 } },
        { 
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

请注意,它应该是具有相同字段和排序的单个复合索引:

db.getCollection("the-collection").createIndex({ dataSourceName:1, version:1 })