Question

我有一个包含大约100万份文档的MongoDB集合。

文件基本上如下：

_id             : ObjectId("asd1234567890")
_reference_1_id : ObjectId("fgh4567890123")
_reference_2_id : ObjectId("jkl7890123456")
name            : "Test1"
id              : "4815162342"
created_time    : Date( 1331882436000 )
_contexts       : ["context1", "context2"]
...

设置了一些索引，这里是 db.mycoll.getIndexes（）的输出;

[
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_id_"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "id" : 1
    },
    "unique" : true,
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1_id_1"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "_contexts" : 1,
        "created_time" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1__contexts_1_created_time_1"
}
]

当我执行像

这样的查询时

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")})

它需要一个多小时（！），直到它完成，无论是否有结果。有什么想法吗？

更新这是

的输出

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")}).explain();

看起来像：

{
"cursor" : "BasicCursor",
"nscanned" : 99209163,
"nscannedObjects" : 99209163,
"n" : 5007,
"millis" : 5705175,
"nYields" : 17389,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Answer 1

你没有任何mongo会自动使用的索引，所以它正在进行全表扫描。

正如the docs

中所述

如果查询中没有[索引]的第一个键，则仅在显式提示时才使用索引。

<强>为什么

如果您在a，b上有索引，并且仅按a搜索 - 将自动使用索引。这是因为它是索引的开始（这很快），db可以忽略索引值的其余部分。

单独使用b进行搜索时，a，b的索引效率低，因为它无法使用“以thisfixedstring开头”来使用索引搜索。

所以，要么：

在查询中包含_reference_1_id（可能不相关）
或者在_reference_2_id上添加索引（如果您经常按字段查询）
或使用提示

<强>提示

现在可能是您的最低成本选项。

添加查询提示以强制使用您的_reference_1_id_1__reference_2_id_1_id_1索引。这可能比全表扫描快得多，但仍然比从您在查询中使用的字段开始的索引慢很多。

即。

db.mycoll
    .find({"_reference_2_id" : ObjectId("jkl7890123456")})
    .hint("_reference_1_id_1__reference_2_id_1_id_1");

Answer 2

我会尝试在_reference_2_id上设置一个非唯一索引，因为目前我怀疑你将完成相当于全表扫描的工作，即使索引包含_reference_2_id，它们不会被使用（见here）。

Answer 3

惠，我在相同数量的数据上安静了同样的问题。在文档中，编写了带索引的查询必须符合ram。我认为情况并非如此，查询必须先做很多磁盘访问才能先检索索引然后获取值。在您的情况下，直接收集读取会更快。

EV。

虽然设置了索引，但简单的MongoDB查询非常慢

3 个答案: