我使用以下python脚本监视postgreSQL从属服务器的复制延迟,该脚本从主服务器查询pg_current_xlog_location
,并将其与从服务器上的pg_xlog_replay_location
进行比较。最近,我收到了警告电子邮件,表明复制滞后在2k到70k字节之间。
这里有什么合理的期望?我假设它基于WAL缓冲区大小和检查点间隔,但我不确定如何计算它。另外,与奴隶上的pg_xlog_receive_location
进行比较会更好吗?
P.S。我还通过在sent_location
视图中将replay_location
与pg_stat_replication
进行比较来监控主服务器上的复制。另外,我检查主服务器是否处于streaming
模式。该监视器从未发出警报......
#!/usr/bin/python
import subprocess
slaveXlogDiffLimitBytes = 128
try:
repModeRes = subprocess.check_output('psql -t -p {{postgresql_port}} -c "SELECT pg_is_in_recovery()"', shell=True)
isInRepMode = repModeRes.strip() == 't'
masterXlogLocationRes = subprocess.check_output('psql -t -p {{postgresql_port}} -h {{postgres_basebackup_host}} -U {{postgres_basebackup_user}} {{postgres_db_name}} -c "select pg_current_xlog_location();"', shell=True)
masterXlogLocationStr = masterXlogLocationRes.strip()
slaveXlogDiffRes = subprocess.check_output('psql -t -p {{postgresql_port}} {{postgres_db_name}} -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), \'' + masterXlogLocationStr + '\'::pg_lsn);"', shell=True)
slaveXlogDiffBytes = float(slaveXlogDiffRes.strip())
except subprocess.CalledProcessError as e:
print "Error retrieving stats: {0}".format(e)
exit(1)
if isInRepMode != True:
print ('Slave server is not in recovery mode')
exit(1)
if slaveXlogDiffBytes > slaveXlogDiffLimitBytes:
print "Slave server replication is behind master by %f bytes" % slaveXlogDiffBytes
exit(1)
print('All clear!')
exit(0)