以下是我的索引
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "spider.pages",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"url" : 1
},
"unique" : true,
"ns" : "spider.pages",
"name" : "url_1"
},
{
"v" : 1,
"key" : {
"parsed" : 1
},
"ns" : "spider.pages",
"name" : "parsed_1"
},
{
"v" : 1,
"key" : {
"date" : 1,
"category" : 1
},
"ns" : "spider.pages",
"name" : "date_1_category_1"
},
{
"v" : 1,
"key" : {
"indexed" : 1
},
"ns" : "spider.pages",
"name" : "indexed_1"
},
{
"v" : 1,
"key" : {
"link_extracted" : 1
},
"ns" : "spider.pages",
"name" : "link_extracted_1"
}
]
从我的python命令行解释器中得到以下内容,
>>> [item for item in pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'link_extracted', 1).limit(10)]
[{u'_id': ObjectId('53f988d820ba2709e89a1dc2')}, {u'_id': ObjectId('53f988d820ba270a1a9a1dbd')}, {u'_id': ObjectId('53f988e720ba2708fe9a1de4')}, {u'_id': ObjectId('53f994b620ba2706099a231e')}, {u'_id': ObjectId('53f988d820ba270bb49a1d10')}, {u'_id': ObjectId('53f994b720ba2706099a2320')}, {u'_id': ObjectId('53f9918720ba2708fe9a1fab')}, {u'_id': ObjectId('53f9949b20ba270bb49a215a')}, {u'_id': ObjectId('53f78ee420ba27220010098d')}, {u'_id': ObjectId('53f78ee620ba2721ed79d317')}]
>>> [item for item in pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'indexed', 1).limit(10)]
[{u'_id': ObjectId('53fb38c420ba27327b725aa9')}, {u'_id': ObjectId('53fb334d20ba2715f87265c2')}, {u'_id': ObjectId('53fb38f520ba2715f872674c')}, {u'_id': ObjectId('53fb38f520ba27327b725abe')}, {u'_id': ObjectId('53fb3eab20ba273348725c0c')}, {u'_id': ObjectId('53fafc1920ba27149b7257fa')}, {u'_id': ObjectId('53fafc1620ba27149b7257f7')}, {u'_id': ObjectId('53fafc1520ba27149b7257f6')}, {u'_id': ObjectId('53fb38f020ba2715f8726748')}, {u'_id': ObjectId('53fb38ef20ba2732d8725a9a')}]
>>> [item for item in pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'url', 1).limit(10)]
[{u'_id': ObjectId('53f848d920ba27319c4338ef')}, {u'_id': ObjectId('53f810e120ba27222952d374')}, {u'_id': ObjectId('53f810e120ba27222952d373')}, {u'_id': ObjectId('53f80bd220ba27222d52caef')}, {u'_id': ObjectId('53f80bd220ba27222d52caf0')}, {u'_id': ObjectId('53f823c220ba27222952d922')}, {u'_id': ObjectId('53f84c7720ba2731964338ff')}, {u'_id': ObjectId('53f911f620ba27458f434158')}, {u'_id': ObjectId('53f8163c20ba27222952d4cb')}, {u'_id': ObjectId('53f8162c20ba27222952d4c1')}]
但是当我输入以下内容时,我会得到无尽的延迟!
>>> [item for item in pages.find({u'link_extracted': 0}, { u'_id':1}).sort(u'date', 1).limit(10)] # Endless wait
我有60万个文档,每个文档都有 date 属性。
答案 0 :(得分:1)
原因是因为您在查询中使用link_extracted
,它正在跳过date
索引并尝试根据日期对结果集进行排序。当有大量记录时,这很慢。
您可以使用composite index:
解决此问题db.pages.ensureIndex({link_extracted: 1, date: 1})
与索引一样,这会增加一些内存开销,并且在创建索引时会增加一些处理开销。您可以在此处查看有关MongoDB和索引内存使用情况的更多详细信息:MongoDB index/RAM relationship