如何索引我的集合以使用复合多键索引

时间:2012-10-08 13:02:08

标签: mongodb

这是我要查询的文档:

{
"_id":ObjectId("5062d30522dfae0e11000000"),
"id_resource" : "147",
"moment_created" : ISODate("2012-03-22T16:29:21Z"),
"moment_updated" : ISODate("2012-03-22T16:29:21Z"),
"users_involved" : [
    {
        "id_user" : "113928869",
        "state" : "answered",
        "id_folder" : "0",
        "is_deleted" : "0"
    },
    {
        "id_user" : "121624627",
        "state" : "new",
        "id_folder" : "0",
        "is_deleted" : "0" }
],
"posts" : [
    {
        "id_author" : "113928869",
        "post" : "hiohhio",
        "moment_created" : ISODate("2012-03-22T16:29:21Z")
    }
    ]
}

这就是我试图确保我的索引的方式:

db.message.ensureIndex({id_resource:1, users_involved : 1});

这是我用来查询我的集合的查询:

db.message.find({id_resource : "143", "users_involved" : {$elemMatch : {id_user : "101226353", state : "answered"}}});

但稍后我会解释一下这个输出:

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BasicCursor",
    "n" : 11,
    "nChunkSkips" : 0,
    "nYields" : 8624,
    "nscanned" : 1461277,
    "nscannedAllPlans" : 1461277,
    "nscannedObjects" : 1461277,
    "nscannedObjectsAllPlans" : 1461277,
    "millisShardTotal" : 1878,
    "millisShardAvg" : 939,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1646

}

getIndexes将返回:

[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "ns" : "messaging.message",
            "name" : "_id_"
    },
    {
            "v" : 1,
            "key" : {
                    "id_resource" : 1,
                    "users_involved" : 1
            },
            "ns" : "messaging.message",
            "name" : "id_resource_1_users_involved_1"
    }

遗憾的是,我不明白为什么我的查询没有使用索引id_resource_1_users_involved_1。任何人都可以向我解释为什么我的索引没有使用或我如何构建我的索引来支持我想要使用的查询?

时间和帮助

更新

对我不好,这是我的一个错字。所以这是查询的实际解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 2,
    "nscanned" : 46868,
    "nscannedAllPlans" : 93736,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 93736,
    "millisShardTotal" : 281,
    "millisShardAvg" : 140,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 220

}

所以查询IS使用我的索引,但它仍然很慢,nscan也很大,所以不是使用整个索引?我将不得不检查nscanned是否与资源x的消息量匹配

使用JohnnyHK的复合索引,速度要快得多:

ensureIndex({id_resource:1, 'users_involved.id_user':1, 'users_involved.state':1});

解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved.id_user_1_users_involved.state_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 0,
    "nscanned" : 7,
    "nscannedAllPlans" : 7,
    "nscannedObjects" : 7,
    "nscannedObjectsAllPlans" : 7,
    "millisShardTotal" : 0,
    "millisShardAvg" : 0,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1
}

所以,如果我想查询users_involved数组,我必须为每个查询构建一个单独的索引?

也@JohnnyHK使用如下所述的整个数组:

find({id_resource : "197", "users_involved" : {$elemMatch : {id_user : "128825371", state : "answered", id_folder:"0", is_deleted:"0"}}}).hint("id_resource_1_users_involved_1")

没有改善任何事情,解释:

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 1,
    "nscanned" : 46868,
    "nscannedAllPlans" : 46868,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 46868,
    "millisShardTotal" : 222,
    "millisShardAvg" : 111,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 174

}

或者我仍然做错了?

*我也从解释响应中删除了分片信息,如果这些信息可能很重要,只需这么说

1 个答案:

答案 0 :(得分:1)

因为复合索引包含整个users_involved数组,所以索引只能在匹配数组的完整嵌入文档元素时使用。请参阅here

我认为使用包含您要搜索的users_involved字段的复合索引会更好。所以:

db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1});

db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1, 'users_involved.state' : 1});