我们有一个3节点的副本集,它会定期崩溃并无法恢复。通过我们的PRIMARY服务器的mongod.log文件,我看到了多个错误。我不知道从哪里开始,甚至在这篇文章中包括什么,但我会从我收到的错误开始。如果我遗失了什么,请告诉我,我会编辑帖子并加入。任何人都可以解释为什么会发生这种情况吗?
Thu Feb 27 14:09:47.790 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet syncing to: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.791 [rsBackgroundSync] repl: couldn't connect to server mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.792 [conn152] end connection xx.xxx.xxx.107:43904 (71 connections now open)
Thu Feb 27 14:09:48.077 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:09:48.077 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet info mongos2i.hostname.com:27017 is down (or slow to respond):
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state DOWN
Thu Feb 27 14:09:48.080 [rsMgr] not electing self, mongos1i.hostname.com:27017 would veto with 'mongom1i.hostname.com:27017 is trying to elect itself but mongos2i.hostname.com:27017 is already primary and more up-to-date'
Thu Feb 27 14:09:49.079 [conn153] replSet info voting yea for mongos1i.hostname.com:27017 (1)
Thu Feb 27 14:09:50.080 [rsHealthPoll] replSet member mongos1i.hostname.com:27017 is now in state PRIMARY
Thu Feb 27 14:09:50.081 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is up
Thu Feb 27 14:09:50.082 [initandlisten] connection accepted from xx.xxx.xxx.107:43907 #154 (72 connections now open)
Thu Feb 27 14:09:50.082 [conn154] end connection xx.xxx.xxx.107:43907 (71 connections now open)
Thu Feb 27 14:09:50.086 [initandlisten] connection accepted from xx.xxx.xxx.107:43909 #155 (72 connections now open)
Thu Feb 27 14:09:50.792 [rsBackgroundSync] replSet syncing to: mongos1i.hostname.com:27017
Thu Feb 27 14:09:52.082 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:10:04.090 [conn155] end connection xx.xxx.xxx.107:43909 (71 connections now open)
Thu Feb 27 14:10:04.091 [initandlisten] connection accepted from xx.xxx.xxx.107:43913 #156 (72 connections now open)
Thu Feb 27 14:10:10.731 [conn153] end connection xx.xxx.xxx.97:52297 (71 connections now open)
Thu Feb 27 14:10:10.732 [initandlisten] connection accepted from xx.xxx.xxx.97:52302 #157 (72 connections now open)
Thu Feb 27 14:10:29.706 [initandlisten] connection accepted from 127.0.0.1:56436 #158 (73 connections now open)
Thu Feb 27 14:10:34.100 [conn156] end connection xx.xxx.xxx.107:43913 (72 connections now open)
Thu Feb 27 14:10:34.101 [initandlisten] connection accepted from xx.xxx.xxx.107:43916 #159 (73 connections now open)
Thu Feb 27 14:10:40.743 [conn157] end connection xx.xxx.xxx.97:52302 (72 connections now open)
Thu Feb 27 14:10:40.744 [initandlisten] connection accepted from xx.xxx.xxx.97:52309 #160 (73 connections now open)
Thu Feb 27 14:11:04.110 [conn159] end connection xx.xxx.xxx.107:43916 (72 connections now open)
Thu Feb 27 14:11:04.111 [initandlisten] connection accepted from xx.xxx.xxx.107:43918 #161 (73 connections now open)
Thu Feb 27 14:11:09.191 [conn161] end connection xx.xxx.xxx.107:43918 (72 connections now open)
Thu Feb 27 14:11:09.452 [initandlisten] connection accepted from xx.xxx.xxx.107:43919 #162 (73 connections now open)
Thu Feb 27 14:11:09.453 [conn162] end connection xx.xxx.xxx.107:43919 (72 connections now open)
Thu Feb 27 14:11:09.456 [initandlisten] connection accepted from xx.xxx.xxx.107:43921 #163 (73 connections now open)
Thu Feb 27 14:11:10.111 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:11:10.111 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:11:10.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state STARTUP2
Thu Feb 27 14:11:10.755 [conn160] end connection xx.xxx.xxx.97:52309 (72 connections now open)
Thu Feb 27 14:11:10.757 [initandlisten] connection accepted from xx.xxx.xxx.97:52311 #164 (73 connections now open)
Thu Feb 27 14:11:12.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:11:23.462 [conn163] end connection xx.xxx.xxx.107:43921 (72 connections now open)
Thu Feb 27 14:11:23.463 [initandlisten] connection accepted from xx.xxx.xxx.107:43925 #165 (73 connections now open)
Thu Feb 27 14:11:31.831 [conn158] end connection 127.0.0.1:56436 (72 connections now open)
Thu Feb 27 14:11:40.768 [conn164] end connection xx.xxx.xxx.97:52311 (71 connections now open)
Thu Feb 27 14:11:40.769 [initandlisten] connection accepted from xx.xxx.xxx.97:52315 #166 (72 connections now open)
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
我们正在使用CentOS和Mongo 2.4.9。
提前感谢您的帮助。
答案 0 :(得分:5)
您发布的日志输出显示您的MongoDB实例未崩溃。它正常退出。 请考虑以下几行:
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
上面的第一行表示您的MongoDB实例已从您的操作系统(SIGTERM)收到信号15。这导致MongoDB终止。 SIGTERM是kill命令的默认级别,也是大多数Linux发行版中init脚本的停止部分。