我有一个包含3个配置dbs和3个分片的群集。我正在查询具有106M记录的数据库,每个记录包含410个字段。我使用分片键:
导入了这些数据{state: 1, zipCode: 1}.
当我单独运行以下查询时,每个查询都会在不到5秒的时间内完成。 (SC = 1.6M记录,NC = 5.2M记录)
db.records.find( { "state" : "NC" } ).count()
db.records.find( { "state" : "SC" } ).count()
db.records.find( { "state" : { $in : ["NC"] } } ).count()
db.records.find( { "state" : { $in : ["SC"] } } ).count()
然而,当我使用$ in或$查询两个状态时,查询需要一个多小时才能完成。
db.records.find( "state" : { $in : [ "NC" , "SC" ] } ).count()
db.records.find({ $or : [ { "state" : "NC" }, { "state" : "SC" } ).count()
两个状态的全部存在于1个碎片上。以下是使用$ in查询的.explain()的结果:
db.records.find({"state":{$in:["NC","SC"]}}).explain()
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SINGLE_SHARD",
"shards" : [
{
"shardName" : "s2",
"connectionString" : "s2/192.168.2.17:27000,192.168.2.17:27001",
"serverInfo" : {
"host" : "MonDbShard2",
"port" : 27000,
"version" : "3.4.7",
"gitVersion" : "cf38c1b8a0a8dca4a11737581beafef4fe120bcd"
},
"plannerVersion" : 1,
"namespace" : "DBNAME.records",
"indexFilterSet" : false,
"parsedQuery" : {
"state" : {
"$in" : [
"NC",
"SC"
]
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"state" : 1,
"zipCode" : 1
},
"indexName" : "state_1_zipCode_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"state" : [ ],
"zipCode" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"state" : [
"[\"NC\", \"NC\"]",
"[\"SC\", \"SC\"]"
],
"zipCode" : [
"[MinKey, MaxKey]"
]
}
}
}
},
"rejectedPlans" : [ ]
}
]
}
},
"ok" : 1
}
为什么一次查询两个州导致完成时间如此剧烈的差异?此外,查询单个zipCode而不包括相应的状态,导致完成时间的相同显着差异。我觉得我误解了碎片键实际上是如何运作的。有什么想法吗?