Pymongo查询花了无尽的时间

时间:2014-08-26 14:16:58

标签: python mongodb pymongo mongodb-query

以下是我的索引

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "spider.pages",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "url" : 1
        },
        "unique" : true,
        "ns" : "spider.pages",
        "name" : "url_1"
    },
    {
        "v" : 1,
        "key" : {
            "parsed" : 1
        },
        "ns" : "spider.pages",
        "name" : "parsed_1"
    },
    {
        "v" : 1,
        "key" : {
            "date" : 1,
            "category" : 1
        },
        "ns" : "spider.pages",
        "name" : "date_1_category_1"
    },
    {
        "v" : 1,
        "key" : {
            "indexed" : 1
        },
        "ns" : "spider.pages",
        "name" : "indexed_1"
    },
    {
        "v" : 1,
        "key" : {
            "link_extracted" : 1
        },
        "ns" : "spider.pages",
        "name" : "link_extracted_1"
    }
]

从我的python命令行解释器中得到以下内容,

>>> [item for item in  pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'link_extracted', 1).limit(10)]
[{u'_id': ObjectId('53f988d820ba2709e89a1dc2')}, {u'_id': ObjectId('53f988d820ba270a1a9a1dbd')}, {u'_id': ObjectId('53f988e720ba2708fe9a1de4')}, {u'_id': ObjectId('53f994b620ba2706099a231e')}, {u'_id': ObjectId('53f988d820ba270bb49a1d10')}, {u'_id': ObjectId('53f994b720ba2706099a2320')}, {u'_id': ObjectId('53f9918720ba2708fe9a1fab')}, {u'_id': ObjectId('53f9949b20ba270bb49a215a')}, {u'_id': ObjectId('53f78ee420ba27220010098d')}, {u'_id': ObjectId('53f78ee620ba2721ed79d317')}]
>>> [item for item in  pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'indexed', 1).limit(10)]
[{u'_id': ObjectId('53fb38c420ba27327b725aa9')}, {u'_id': ObjectId('53fb334d20ba2715f87265c2')}, {u'_id': ObjectId('53fb38f520ba2715f872674c')}, {u'_id': ObjectId('53fb38f520ba27327b725abe')}, {u'_id': ObjectId('53fb3eab20ba273348725c0c')}, {u'_id': ObjectId('53fafc1920ba27149b7257fa')}, {u'_id': ObjectId('53fafc1620ba27149b7257f7')}, {u'_id': ObjectId('53fafc1520ba27149b7257f6')}, {u'_id': ObjectId('53fb38f020ba2715f8726748')}, {u'_id': ObjectId('53fb38ef20ba2732d8725a9a')}]
>>> [item for item in  pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'url', 1).limit(10)]
[{u'_id': ObjectId('53f848d920ba27319c4338ef')}, {u'_id': ObjectId('53f810e120ba27222952d374')}, {u'_id': ObjectId('53f810e120ba27222952d373')}, {u'_id': ObjectId('53f80bd220ba27222d52caef')}, {u'_id': ObjectId('53f80bd220ba27222d52caf0')}, {u'_id': ObjectId('53f823c220ba27222952d922')}, {u'_id': ObjectId('53f84c7720ba2731964338ff')}, {u'_id': ObjectId('53f911f620ba27458f434158')}, {u'_id': ObjectId('53f8163c20ba27222952d4cb')}, {u'_id': ObjectId('53f8162c20ba27222952d4c1')}]

但是当我输入以下内容时,我会得到无尽的延迟!

>>> [item for item in  pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'date', 1).limit(10)] # Endless wait

我有60万个文档,每个文档都有 date 属性。

1 个答案:

答案 0 :(得分:1)

原因是因为您在查询中使用link_extracted,它正在跳过date索引并尝试根据日期对结果集进行排序。当有大量记录时,这很慢。

您可以使用composite index

解决此问题
db.pages.ensureIndex({link_extracted: 1, date: 1})

与索引一样,这会增加一些内存开销,并且在创建索引时会增加一些处理开销。您可以在此处查看有关MongoDB和索引内存使用情况的更多详细信息:MongoDB index/RAM relationship