AWS - EC2 - MongoDB副本集时间同步问题 - NTP - 复制滞后

时间:2014-12-12 16:22:02

标签: linux mongodb amazon-ec2 replication ntp

我们在AWS上运行的MongoDB副本集遇到时钟漂移问题。在我们向集合中添加额外数据之后,这似乎才开始发生,之后我们没有真正注意到这个问题,除非系统负载很重。偶尔会在mongod.log文件中记录以下错误,并且系统未处于负载状态。

为了测试这一点,我们已经隔离了一组具有相同数据集但未被我们的Web应用程序使用的机器,尽管错误仍在发生;

  

2014-12-12T13:33:51.333 + 0000 [rsBackgroundSync]更改同步目标   因为当前同步目标的最新OpTime是Dec 12 13:32:42:c   这比成员mongo1:27017落后30多秒   最近的OpTime是1418391230

从上面开始,时间戳显示其中一个mongodb副本集成员落后一分钟。我们看到的最糟糕的是12分钟不同步。

这个错误反过来会导致复制延迟,我们会从Mongo Monitoring Service收到有关此错误的通知,尽管它本身就是正确的。

设置是3个r3.xlarge AWS Linux实例,EU-West-1A区域的每个可用区域中有1个。这些机器已经使用带有Raid数组的Mongo推荐设置和Mongo提供的cloud formation脚本进行设置。数据大小约为4GB。

我们认为该问题与NTP同步有关,默认情况下,在AWS Linux亚马逊机器映像上,ntpd服务已配置为转到www.pool.ntp.org上托管的aws ntp服务器池。< / p>

为了尝试排除这一点,我们在AWS上设置了我们自己的NTP服务器,MongoDB服务器可以同步到该服务器。问题仍然存在,因此我们更改了mongo机器上ntpd服务的maxpoll和minpoll时间,以便从NTP服务器同步时间every 16 seconds,但错误仍在发生。

我们也增加了MongoDB OpLog的大小,看看是否会有所不同,但事实并非如此。

还有其他人遇到过这类问题吗?有没有我们遗失的东西?

干杯,

科林。

ps -ef | grep ntp;

mongodb1
ntp       5163     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839  0 09:31 pts/2    00:00:00 grep ntp

mongodb2
ntp       4834     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029  0 09:31 pts/0    00:00:00 grep ntp

mongodb3
ntp       5795     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173  0 09:31 pts/0    00:00:00 grep ntp

cat /etc/ntp.conf;

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst

#broadcast 192.168.1.255 autokey        # broadcast server
#broadcastclient                        # broadcast client
#broadcast 224.0.1.1 autokey            # multicast server
#multicastclient 224.0.1.1              # multicast client
#manycastserver 239.255.254.254         # manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats

# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall

# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6

ntpq -npcrv;

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*172.31.14.137   91.*.*.*      3 u  557 1024  377    1.121   -0.264   0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1  Tue, Dec 16 2014  9:10:18.091,
clock=d83a77a7.82431efa  Tue, Dec 16 2014  9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053

1 个答案:

答案 0 :(得分:2)

使用WiredTiger存储引擎升级到MongoDB 3后,我们不再看到此问题。