我试过了这个查询
db.tablebusiness.find({ "LongitudeLatitude" : { "$nearSphere" : [106.772835, -6.186753], "$maxDistance" : 0.053980478460939611 }, "Prominent" : { "$gte" : 15 }, "indexContents" : { "$all" : [/^warung/, /^nasi/] } }).skip(20).limit(20);
这是来自Amazon EC2实例微博的日志
Fri Sep 07 03:21:08 [clientcursormon] mem (MB) res:312 virt:12424 mapped:6094
Fri Sep 07 03:21:43 [conn52] query isikotacobacoba.tablebusiness query: { $query: { LongitudeLatitude: { $nearSphere: [ 106.772835, -6.186753 ], $maxDistance: 0.05398047846093961 }, Prominent: { $gte: 15 }, indexContents: { $all: [ /^warung/, /^nasi/ ] } }, $hint: { LongitudeLatitude: "2d", Prominent: -1, indexContents: 1 } } ntoreturn:20 ntoskip:20 nscanned:40 nreturned:20 reslen:1141 567133ms
Fri Sep 07 03:22:04 [DataFileSync] flushing mmap took 15ms for 9 files
如果我使用自带8GB内存的本地计算机,结果很快,即2秒。 但是,如果我不限制查询,结果仍然很慢。例如:
db.tablebusiness.find({ "LongitudeLatitude" : { "$nearSphere" : [106.772835, -6.186753], "$maxDistance" : 0.053980478460939611 }, "Prominent" : { "$gte" : 15 }, "indexContents" : { "$all" : [/^warung/, /^nasi/] } }).limit(200);
花了很长时间。现在,找到最接近的200分并不是很难对吗?
所以记忆不可能成为问题。如果在5公里范围内只有3600点,怎么试图找到200分需要很长时间。
这是一台大型8GB i5机器上的日志
Fri Sep 07 12:29:23 [conn5] command admin.$cmd command: { buildinfo: 1 } ntoreturn:1 reslen:340 0ms
Fri Sep 07 12:29:25 [conn4] query isikotacobacoba.tablebusiness query: { LongitudeLatitude: { $nearSphere: [ 106.772835, -6.186753 ], $maxDistance: 0.05398047846093961 }, Prominent: { $gte: 15 }, indexContents: { $all: [ /^warung/, /^nasi/ ] } } ntoreturn:100000 ntoskip:20 nscanned:262 nreturned:242 reslen:300329 501562ms
Fri Sep 07 12:29:34 [conn4] run command admin.$cmd { ping: 1 }
这是典型数据的样本
{
"_id" : "warung-nasi-nur-karomah__-6.19_106.78",
"BuildingID" : null,
"Title" : "Warung Nasi Nur Karomah",
"InBuildingAddress" : null,
"Building" : null,
"Street" : "Jl. Arjuna Utara No.35",
"Districts" : [],
"City" : "Jakarta",
"Country" : "Indonesia",
"Checkin" : 0,
"Note" : null,
"PeopleCount" : 0,
"Prominent" : 45.5,
"CountViews" : 0,
"StreetAdditional" : null,
"LongitudeLatitude" : {
"Longitude" : 106.775693893433,
"Latitude" : -6.18759540055471
},
"Rating" : {
"Stars" : 0.0,
"Weight" : 0.0
},
"CurrentlyWorkedURL" : null,
"Reviews" : [],
"ZIP" : null,
"Tags" : ["Restaurant"],
"Phones" : ["081380087011"],
"Website" : null,
"Email" : null,
"Price" : null,
"openingHour" : null,
"Promotions" : [],
"SomethingWrong" : false,
"BizMenus" : [],
"Brochures" : [],
"Aliases" : [],
"indexContents" : ["restaura", "estauran", "staurant", "taurant", "aurant", "urant", "rant", "ant", "nt", "t", "warung", "arung", "rung", "ung", "ng", "g", "nasi", "asi", "si", "i", "nur", "ur", "r", "karomah", "aromah", "romah", "omah", "mah", "ah", "h"]
}
这是我家用机器上的同一查询的日志(不是亚马逊ec2实例微)
Fri Sep 07 10:52:28 [conn1] query isikotacobacoba.tablebusiness query: { LongitudeLatitude: { $nearSphere: [ 106.772835, -6.186753 ], $maxDistance: 0.05398047846093961 }, Prominent: { $gte: 15 }, indexContents: { $all: [ /^warung/, /^nasi/ ] } } ntoreturn:50 nscanned:50 nreturned:50 reslen:62090 2048ms
我知道amazonec2比我的家用电脑慢
索引是
db.tablebusiness.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "isikotacobacoba.tablebusiness",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"LongitudeLatitude" : "2d",
"Prominent" : -1,
"indexContents" : 1
},
"ns" : "isikotacobacoba.tablebusiness",
"name" : "LongLat_Prominent_indexContents",
"dropDups" : false,
"background" : false
},
{
"v" : 1,
"key" : {
"LongitudeLatitude" : "2d",
"Prominent" : -1
},
"ns" : "isikotacobacoba.tablebusiness",
"name" : "LongLat_Prominent",
"dropDups" : false,
"background" : false
}
]
如您所见,它是正确的索引
一个可能的问题是亚马逊微实例中缺乏记忆。
然而,nearSphere受限于0.053980478460939611度(约5公里)。即使没有索引,即使只进行表扫描,它也不需要那么多内存。
真正的问题是什么?
这是mongodb的buildinfo
> db.runCommand("buildInfo")
{
"version" : "2.0.7",
"gitVersion" : "875033920e8869d284f32119413543fa475227bf",
"sysInfo" : "windows sys.getwindowsversion(major=6, minor=1, build=7601,
platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_42",
"versionArray" : [
2,
0,
7,
0
],
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"ok" : 1
}
>
我做了一些进一步的测试:
db.tablebusiness.find({“LongitudeLatitude”:{“$ nearSphere”:[106.772835,-6.186753],“$ maxDistance”:0.053980478460939611}})。skip(20).limit(100000); 返回“仅”3600个文档。实际上它需要500秒。
即使mongodb不使用索引,扫描3600文档,计算距离然后对它们进行排序也不会花费很长时间,即使对于微型机器也是如此。
现在,如果我不使用$ nearsphere而不是$ near而是事情变得更好但仍然令人失望
Fri Sep 07 04:49:38 [conn61] query isikotacobacoba.tablebusiness query: { LongitudeLatitude: { $near: [ 106.772835, -6.186753 ], $maxDistance: 0.05398047846093961 }, Prominent: { $gte: 15.0 }, indexContents: { $all: [ /^warung/, /^nasi/ ] } } ntoreturn:20 ntoskip:20 nscanned:32 nreturned:12 reslen:14984 49636ms
Fri Sep 07 04:49:38 [conn61] run command admin.$cmd { replSetGetStatus: 1, forShell: 1 }
来自Amazon EC2实例Micro的explain()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 40,
"nscannedObjects" : 40,
"n" : 20,
"millis" : 349182,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
使用相同查询从我的localhost家庭计算机解释()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 40,
"nscannedObjects" : 40,
"n" : 20,
"millis" : 4849,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
这是随机发生的。大多数时候,它的速度非常快。当它很慢时,就像地狱一样缓慢。
答案 0 :(得分:3)
EC2 Micro实例只有640MB的RAM,没有本地存储。如果您有一个不适合内存的大型工作集,您将遇到许多页面错误,这将更加昂贵,因为数据需要通过网络进行分页。
为了测试这一点,您可以在执行查询时运行mongostat并检查是否存在许多页面错误。如果是这种情况,升级到具有更多RAM和本地存储的更大的EC2实例可能会解决该问题。
答案 1 :(得分:0)
我在这里问了类似的问题Why $in is much faster than $all?
原来mongodb中有一个影响$ all的错误。这是主要问题。更改硬件会有所改进,但不会因为根本不使用$ all而烦恼。