我们在AWS上运行的MongoDB副本集遇到时钟漂移问题。在我们向集合中添加额外数据之后,这似乎才开始发生,之后我们没有真正注意到这个问题,除非系统负载很重。偶尔会在mongod.log文件中记录以下错误,并且系统未处于负载状态。
为了测试这一点,我们已经隔离了一组具有相同数据集但未被我们的Web应用程序使用的机器,尽管错误仍在发生;
2014-12-12T13:33:51.333 + 0000 [rsBackgroundSync]更改同步目标 因为当前同步目标的最新OpTime是Dec 12 13:32:42:c 这比成员mongo1:27017落后30多秒 最近的OpTime是1418391230
从上面开始,时间戳显示其中一个mongodb副本集成员落后一分钟。我们看到的最糟糕的是12分钟不同步。
这个错误反过来会导致复制延迟,我们会从Mongo Monitoring Service收到有关此错误的通知,尽管它本身就是正确的。
设置是3个r3.xlarge
AWS Linux实例,EU-West-1A
区域的每个可用区域中有1个。这些机器已经使用带有Raid数组的Mongo推荐设置和Mongo提供的cloud formation
脚本进行设置。数据大小约为4GB。
我们认为该问题与NTP
同步有关,默认情况下,在AWS Linux亚马逊机器映像上,ntpd服务已配置为转到www.pool.ntp.org
上托管的aws ntp服务器池。< / p>
为了尝试排除这一点,我们在AWS上设置了我们自己的NTP服务器,MongoDB服务器可以同步到该服务器。问题仍然存在,因此我们更改了mongo机器上ntpd服务的maxpoll和minpoll时间,以便从NTP服务器同步时间every 16 seconds
,但错误仍在发生。
我们也增加了MongoDB OpLog的大小,看看是否会有所不同,但事实并非如此。
还有其他人遇到过这类问题吗?有没有我们遗失的东西?
干杯,
科林。
ps -ef | grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
答案 0 :(得分:2)
使用WiredTiger存储引擎升级到MongoDB 3后,我们不再看到此问题。