我有3个MongoDB实例的副本集。这些实例具有8GB RAM和双核2.27 GHz CPU。所有实例都运行2.2.2版(我在2.0.1中看到了相同的行为)。
这是我的问题:我们的主要实例(副本集的主人)最近习惯了每2天抓取100%的CPU。追查原因,我决定运行MongoDB分析器。我发现了数百个非常慢的查询。这是一个例子:
> db.system.profile.find()
{
"ts" : ISODate("2012-12-16T20:31:39.078Z"),
"op" : "command",
"ns" : "stylesaint.$cmd",
"command" : {
"count" : "tears",
"query" : {
"_id" : { "$gt" : ObjectId("50cdeadeaf58d3de96000294") },
"active" : true,
"is_image_processed" : true,
"hidden_from_feed" : false,
"hidden_from_public_feeds" : false
},
"fields" : null
},
"ntoreturn" : 1,
"responseLength" : 48,
"millis" : 13930,
"client" : "#########"
}
从我读过的有关mongodb的内容来看,这些情况下自然的下一步是尝试解释()这些查询。但是,explain()并不能解释查询的缓慢性:
> db.tears.find({ "_id" : { "$gt" : ObjectId("50cdeadeaf58d3de96000294") }, "active" : true, "is_image_processed" : true, "hidden_from_feed" : false, "hidden_from_public_feeds" : false }).explain()
{
"cursor" : "BtreeCursor id",
"isMultiKey" : false,
"n" : 4,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 23,
"nscannedAllPlans" : 25,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
ObjectId("50cdeadeaf58d3de96000294"),
ObjectId("ffffffffffffffffffffffff")
]
]
},
"server" : "#########"
}
扫描5个文档不应该花费13秒。正在发生的其他事情正在减慢查询速度。也许其他一些查询会使服务器的资源匮乏?但是,我不知道在哪里看。您可以提供任何建议。
MongoDB日志
我在启动过程中找不到任何警告:
***** SERVER RESTARTED *****
Sun Dec 16 21:02:56 [initandlisten] MongoDB starting : pid=...
Sun Dec 16 21:02:56 [initandlisten] db version v2.2.2, pdfile version 4.5
Sun Dec 16 21:02:56 [initandlisten] git version: ...
Sun Dec 16 21:02:56 [initandlisten] build info: Linux 2.6.21.7-2 ...
Sun Dec 16 21:02:56 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/data/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "...", rest: "true" }
Sun Dec 16 21:02:56 [initandlisten] journal dir=/data/mongodb/journal
Sun Dec 16 21:02:56 [initandlisten] recover : no journal files present, no recovery needed
Sun Dec 16 21:02:56 [initandlisten] waiting for connections on port ...
Sun Dec 16 21:02:56 [websvr] admin web console waiting for connections on port ...
Sun Dec 16 21:02:56 [initandlisten] connection accepted from ...
Sun Dec 16 21:02:56 [conn1] end connection ... (0 connections now open)
Sun Dec 16 21:02:56 [initandlisten] connection accepted from ... #2 (1 connection now open)
Sun Dec 16 21:02:56 [rsStart] replSet I am ...
Sun Dec 16 21:02:56 [rsStart] replSet STARTUP2
Sun Dec 16 21:02:56 [rsHealthPoll] replSet member ... is up
Sun Dec 16 21:02:56 [rsHealthPoll] replSet member ... is now in state SECONDARY
Sun Dec 16 21:02:57 [initandlisten] connection accepted from ... #3 (2 connections now open)
Sun Dec 16 21:02:57 [rsSync] replSet SECONDARY
Sun Dec 16 21:02:58 [initandlisten] connection accepted from ... #4 (3 connections now open)
Sun Dec 16 21:02:58 [initandlisten] connection accepted from ... #5 (4 connections now open)
Sun Dec 16 21:02:58 [conn5] end connection ... (3 connections now open)
Sun Dec 16 21:02:58 [rsHealthPoll] replSet member ... is up
Sun Dec 16 21:02:58 [rsHealthPoll] replSet member ... is now in state PRIMARY
Sun Dec 16 21:02:59 [initandlisten] connection accepted from ... #6 (4 connections now open)
Sun Dec 16 21:03:00 [initandlisten] connection accepted from ... #7 (5 connections now open)
Sun Dec 16 21:03:02 [conn7] end connection ... (4 connections now open)
Sun Dec 16 21:03:03 [rsBackgroundSync] replSet syncing to: ...
Sun Dec 16 21:03:04 [rsSyncNotifier] replset setting oplog notifier to ...
Sun Dec 16 21:03:06 [conn2] end connection ... (3 connections now open)
Sun Dec 16 21:03:06 [initandlisten] connection accepted from ... #8 (4 connections now open)
Sun Dec 16 21:03:08 [initandlisten] connection accepted from ... #9 (5 connections now open)
Sun Dec 16 21:03:13 [initandlisten] connection accepted from ... #10 (6 connections now open)
Sun Dec 16 21:03:13 [conn10] end connection ... (5 connections now open)
Sun Dec 16 21:03:13 [initandlisten] connection accepted from ... #11 (6 connections now open)
Sun Dec 16 21:03:15 [conn3] end connection ... (5 connections now open)
Sun Dec 16 21:03:16 [rsHealthPoll] replSet member .... is now in state SECONDARY
Sun Dec 16 21:03:16 [rsMgr] replSet info electSelf 1
Sun Dec 16 21:03:16 [rsMgr] replSet PRIMARY
回复:请求更多信息
目前,MongoDB正常运作; 100毫秒以上没有任何疑问。一旦100%CPU再次发生,我将发布有关系统资源的更多信息。
答案 0 :(得分:0)
首先,我认为查询可能是一个红色的鲱鱼。您是否在NUMA架构下运行这些服务器?您可以阅读Mongo docs for usage on NUMA systems。
如果您在NUMA系统上运行,那么使用numactl运行带有交错策略的守护程序可能会解决您的问题。
您可以查看是否有任何启动警告。它们会在您启动守护程序时出现在您的日志中,并且您可以在守护程序运行时找到它们,但我不记得我的头脑是什么。
如果失败,您可以在进行查询时检查您的IO操作。如果我不得不猜测,你正在击中你的磁盘,而不是在内存中使用你的工作集。您的内存使用情况统计信息(免费-h和mongo控制台内部的内存使用情况指标)是什么样的?