我遇到了一些我无法理解的奇怪行为。 给定以下'schema'的文档集合:
{
tag : ["t:someTag", "A", "B", "C"]
msg : "some message"
timestamp : ISODate(...)
someIntField: 1
}
标记在以元素“t:something”开头的数组中,后跟任意数量的字符串标记。 收集统计数据:
db.perf_multikey.stats()
{
"ns" : "test.perf_multikey",
"count" : 36239306,
"size" : 22124848112,
"avgObjSize" : 610,
"storageSize" : 24330923904,
"numExtents" : 32,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 17494579648,
"indexSizes" : {
"_id_" : 1177303120,
"tag_1" : 12851094032,
"timestamp_1" : 1800706768,
"level_1" : 1665475728
},
"ok" : 1
}
我正在执行以下查询:
db.perf_multikey.find({tag: {$all:["t:a", "J"]}})
正如预期的那样,它会命中索引并返回几行:
db.perf_multikey.find({tag: {$all:["t:a", "J"]}}).explain()
{
"cursor" : "BtreeCursor tag_1",
"isMultiKey" : true,
"n" : 6,
"nscannedObjects" : 10,
"nscanned" : 10,
"nscannedObjectsAllPlans" : 10,
"nscannedAllPlans" : 22,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 7,
"indexBounds" : {
"tag" : [
[
"t:a",
"t:a"
]
]
},
"server" : "somefancyserver:27017",
"filterSet" : false
}
但是查询只是按标记数组
中的元素顺序不同db.perf_multikey.find({tag: {$all:["J","t:a"]}})
似乎没有使用索引
db.perf_multikey.find({tag: {$all:["J","t:a"]}}).explain()
{
"cursor" : "Complex Plan",
"n" : 6,
"nscannedObjects" : 0,
"nscanned" : 7866684,
"nscannedObjectsAllPlans" : 7827833,
"nscannedAllPlans" : 15694517,
"nYields" : 139716,
"nChunkSkips" : 0,
"millis" : 118102,
"server" : "samefancyserver:27017",
"filterSet" : false
}
我正在使用MongoDB 2.6.9 看到上述结果我很困惑MongoDB多键索引的工作原理。为什么使用数组的查询依赖于顺序?
修改
升级到MongoDB 3.0.2后,我重新生成了数据集(大小足以使索引不适合RAM)并重新运行测试。 不幸的是我仍然会遇到相同的结果(请注意 tag 字段是遵循某种'模式' - 数组的第一个元素是任意字符串,后跟标记的一些排列 - 来自有限的值的世界,比如说“A” - “J”)。
这些是我的结果:
快速闪电:
> db.perf_multikey.find({tag : {$all : ["a", "J"]}}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.perf_multikey",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"tag" : {
"$eq" : "a"
}
},
{
"tag" : {
"$eq" : "J"
}
}
]
},
"winningPlan" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"tag" : {
"$eq" : "J"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"a\", \"a\"]"
]
}
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "AND_SORTED",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"a\", \"a\"]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"J\", \"J\"]"
]
}
}
]
}
}
}
]
},
"serverInfo" : {
"host" : "fancyhost",
"port" : 27017,
"version" : "3.0.2",
"gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f"
},
"ok" : 1
}
慢一点:
> db.perf_multikey.find({tag : {$all : ["J", "a"]}}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.perf_multikey",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"tag" : {
"$eq" : "J"
}
},
{
"tag" : {
"$eq" : "a"
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "AND_SORTED",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"J\", \"J\"]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"a\", \"a\"]"
]
}
}
]
}
}
},
"rejectedPlans" : [
{
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"tag" : {
"$eq" : "a"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"tag" : 1
},
"indexName" : "tag_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tag" : [
"[\"J\", \"J\"]"
]
}
}
}
}
]
},
"serverInfo" : {
"host" : "fancyhost",
"port" : 27017,
"version" : "3.0.2",
"gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f"
},
"ok" : 1
}
我虽然http://docs.mongodb.org/manual/reference/operator/query/all/#performance可能是答案。
毕竟,通过[“随机字符串”,“A”]查询使用“随机字符串”将潜在结果集缩小到非常小的尺寸,从而易于扫描(?或进一步遍历)。 另一方面,通过[“A”,“随机字符串”]的查询应该很慢,因为“A”将返回大量进一步扫描...但查询[“A”,“随机不存在的字符串”]是闪电般快速......这让我感到困惑。
答案 0 :(得分:0)
我强烈建议升级。我在2.6.1和3.0.0上测试了这个,我没有得到这种行为。
例如,这是2.6.1:
> db.t.find({tags:{$all:['t', 'tags']}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 3,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 7,
"indexBounds" : {
"tags" : [
[
"t",
"t"
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
> db.t.find({tags:{$all:['t', 'tags']}})
{ "_id" : ObjectId("5549f186450548aed9ad4273"), "tags" : [ "tags", "t" ] }
即使首先存在不存在的价值:
> db.t.find({tags:{$all:['f', 'tags']}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"tags" : [
[
"f",
"f"
]
]
},
"server" : "ubuntu:27017",
"filterSet" : false
}
在3.0.0上:
> db.t.find({tags:{$all:['g','t']}}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.t",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"tags" : {
"$eq" : "g"
}
},
{
"tags" : {
"$eq" : "t"
}
}
]
},
"winningPlan" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"tags" : {
"$eq" : "t"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"tags" : 1
},
"indexName" : "tags_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tags" : [
"[\"g\", \"g\"]"
]
}
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "AND_SORTED",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"tags" : 1
},
"indexName" : "tags_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tags" : [
"[\"g\", \"g\"]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"tags" : 1
},
"indexName" : "tags_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"tags" : [
"[\"t\", \"t\"]"
]
}
}
]
}
}
}
]
},
"serverInfo" : {
"host" : "ip-172-30-0-35",
"port" : 27017,
"version" : "3.0.0",
"gitVersion" : "a841fd6394365954886924a35076691b4d149168"
},
"ok" : 1
}
当然,我没有在2.6.9上测试,但我已经测试了越来越高的版本,我无法复制这种行为。