在MongoDB中查询和排序索引集合会导致数据溢出

时间:2014-09-01 13:43:59

标签: mongodb mongodb-query mongodb-java

"事件"是一个上限集合,用于在网页上存储用户点击事件。文档如下所示:

{
    "event_name" : "click",
    "user_id" : "ea0b4027-05f7-4902-b133-ff810b5800e1",
    "object_type" : "ad",
    "object_id" : "ea0b4027-05f7-4902-b133-ff810b5822e5",
    "object_properties" : { "foo" : "bar" },
    "event_properties" : {"foo" : "bar" },
    "time" : ISODate("2014-05-31T22:00:43.681Z")
}

这是这个集合的复合索引:

db.events.ensureIndex({object_type: 1, time: 1});

这就是我的查询方式:

db.events.find( {
   $or : [ {object_type : 'ad'}, {object_type : 'element'} ],
   time: { $gte: new Date("2013-10-01T00:00:00.000Z"), $lte: new Date("2014-09-01T00:00:00.000Z") }}, 
  { user_id: 1, event_name: 1, object_id: 1,  object_type : 1,  obj_properties : 1, time:1 } )
.sort({time: 1});

这导致:" sort()没有索引的数据太多。在mongo 2.4.9中添加索引或指定较小的限制" "溢出排序阶段缓冲数据使用33554618字节超过内部限制33554432字节" 在Mongo 2.6.3。我正在使用Java MongoDB驱动程序2.12.3。当我使用" $ natural"时会抛出同样的错误。排序。似乎MongoDB并没有真正使用为排序定义的索引,但我无法弄清楚为什么(我在索引上阅读了MongoDB文档)。我很欣赏任何提示。

以下是 explain()

的结果
{
    "clauses" : [
        {
            "cursor" : "BtreeCursor object_type_1_time_1",
            "isMultiKey" : false,
            "n" : 0,
            "nscannedObjects" : 0,
            "nscanned" : 0,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "object_type" : [
                    [
                        "element",
                        "element"
                    ]
                ],
                "time" : [
                    [
                        {
                            "$minElement" : 1
                        },
                        {
                            "$maxElement" : 1
                        }
                    ]
                ]
            }
        },
        {
            "cursor" : "BtreeCursor object_type_1_time_1",
            "isMultiKey" : false,
            "n" : 399609,
            "nscannedObjects" : 399609,
            "nscanned" : 399609,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "object_type" : [
                    [
                        "ad",
                        "ad"
                    ]
                ],
                "time" : [
                    [
                        {
                            "$minElement" : 1
                        },
                        {
                            "$maxElement" : 1
                        }
                    ]
                ]
            }
        },
    "cursor" : "QueryOptimizerCursor",
    "n" : 408440,
    "nscannedObjects" : 409686,
    "nscanned" : 409686,
    "nscannedObjectsAllPlans" : 409686,
    "nscannedAllPlans" : 409686,
    "scanAndOrder" : false,
    "nYields" : 6402,
    "nChunkSkips" : 0,
    "millis" : 2633,
    "server" : "MacBook-Pro.local:27017",
    "filterSet" : false
}

1 个答案:

答案 0 :(得分:1)

根据explain(),当mongo运行查询时,它确实使用了复合索引。问题是排序({time:1})。 您的索引是{object_type:1,time:1},这意味着查询结果首先按object_type排序,如果object_type相同,则按时间排序。

对于sort {time:1},mongo必须将所有匹配的对象(399609)加载到内存中以按时间排序,因为顺序与索引不同({object_type:1,time:1 })。假设对象的平均大小为100字节,则超出限制。

更多信息: http://docs.mongodb.org/manual/core/index-compound/

例如,有3个索引为{obj_type:1,time:1}的对象:

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}

db.events.find({})。sort({“obj_type”:1,“time”:1})。limit(2)

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "b", "time" : ISODate("2014-03-31T22:00:43.681Z")}

"nscanned" : 2  (This one use index order, which is sorted by {obj_type:1, time:1})

db.events.find({})排序({ “时间”:1})。。限制(2)

{"obj_type": "a", "time" : ISODate("2014-01-31T22:00:43.681Z")}
{"obj_type": "c", "time" : ISODate("2014-02-31T22:00:43.681Z")}

"nscanned" : 3 (This one will load all the matched results and then sort)